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Notation  and  Definitions 


1.  X  ~  CM  {jjL,  I])  denotes  a  complex  Gaussian  random  vector  x  with  mean  /x  and  with 
independent  real  and  imaginary  parts,  each  having  covariance  matrix  S. 

2.  £[]  is  the  expectation  operator. 

3.  Superscript  “  *  ”  denotes  complex  conjugation. 

4.  Superscript  “t”  denotes  the  transpose  of  a  matrix. 

5.  Superscript  “i?”  denotes  the  conjugate  transpose  of  a  matrix. 

6.  [k]  =  k  mod  T,  where  T  is  the  training  period. 

7.  [a]  denotes  the  ceiling  of  the  real  number  a  (i.e.,  the  smallest  integer  larger  than  or 
equal  to  a. 

8.  is  the  element  in  M  row  and  column  (0  <  i,  j  <  M  —  1)  of  the  M  x  M 
matrix  A. 

9.  Z  (p(t))  denotes  the  Z-transform  of  the  sequence  p{t).  Similarly  Z^^  {P{Z))  denotes 
the  inverse  transform. 

10.  /at  is  the  N  X  N  identity  matrix. 

11.  Az  B  denotes  the  Kronecker  product  of  matrices  A  and  B. 

12.  Aq  B  denotes  the  Hadamard  product  of  matrices  A  and  B. 

13.  In  the  appendices,  we  often  use  T'  =  T—1  for  brevity. 
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1.  Introduction 


Practical  communication  systems  devote  part  of  their  power  and  bandwidth  resources  to 
training.  Increasing  the  power  (or  bandwidth)  allocated  to  training  improves  the  channel 
estimate,  but  decreases  the  power  (bandwidth)  available  for  data  transmission.  We  expect 
that  if  the  channel  varies  more  rapidly,  the  frequency  of  training  would  have  to  be 
increased,  and  if  the  channel  is  noisier,  the  training  energy  would  have  to  be  increased.  But 
channel  mean  square  error  translates  non-linearly  into  (uncoded)  bit  error  rate  (BER),  and 
computing  the  impact  on  coded  BER  is  difficult.  To  understand  these  fundamental 
tradeoffs,  we  use  the  cutoff  rate  as  a  metric. 

The  channel  cutoff  rate  Ro  was  hrst  proposed  by  Wozencraft  and  Jacobs  (e.g.,  see  [34j), 
and  later  popularized  by  Massey  [23],  who  proposed  the  cutoff  rate  as  the  appropriate 
metric  for  evaluating  the  performance  of  various  modulation  schemes  in  coded  systems. 

The  cutoff  rate  plays  a  dual  role:  being  a  lower  bound  on  the  channel  capacity,  Ro 
indicates  a  range  of  rates  R  over  which  reliable  communication  is  possible,  0  <  R  <  Ro, 
and  it  also  gives  a  meaningful  bound  on  the  error  performance  Pe  of  iV-length  block  coding 
(at  any  R  <  Ro)  via  the  expression  In  particular,  the  cutoff  rate  specihes 

the  largest  linear  function  Ro  —  R  which  lower  bounds  the  random  coding  exponent  of  [12]. 
Contradicting  earlier  opinion,  it  is  now  known  that  the  cutoff  rate  can  be  exceeded  under 
hnite  complexity  constraints,  with,  e.g.,  turbo  coding  and  iterative  decoding  schemes  [5]. 
However,  the  cutoff  rate  can  still  be  regarded  as  a  “practical  channel  capacity”  for  simple 
encoding/decoding  strategies.  In  particular,  Ro  has  been  proven  to  play  exactly  this  role 
for  the  case  of  sequential  decoding  [3]. 

Many  previous  works  consider  the  performance  of  coded  communications  systems 
operating  over  the  Rayleigh  fading  channel  under  the  assumption  that  either  perfect 
channel  state  information  (CSI)  or  no  CSI  is  available  at  the  receiver.  In  the  case  of  no 
CSI,  the  cutoff  rate  [15],  [28]  and  capacity  [1],  have  been  examined.*  With  perfect  CSI, 
the  capacity  has  been  found  in  [13],  [7].  It  is  known  that,  with  perfect  CSI,  the  capacity  is 
invariant  with  respect  to  the  time-correlation  of  the  channel.  Conversely,  with  perfect  CSI 
at  the  receiver,  the  cutoff  rate  is  maximized  for  i.i.d.  fading,  and  generally  decreases  as  the 
channel  correlation  increases  [24]-  This  apparent  contradiction  is  resolved  by  noting  that 
capacity  considers  only  the  horizontal  axis  of  the  random  coding  exponent  (i.e.,  the 
maximum  reliable  rate),  whereas  the  cutoff  rate  characterizes  the  vertical  axis  (i.e.,  the 
magnitude)  of  the  exponent  [19].  The  cutoff  rate  with  perfect  CSI  has  been  examined  for 
i.i.d.  fading  [17],  as  well  as  temporally  correlated  fading  in  [21]. 

*In  this  paper,  CSI  refers  strictly  to  receiver  CSI. 
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In  the  intermediate  case  of  imperfect  receiver  CSI,  one  way  to  characterize  reliable  rates, 
and  dehne  optimized  estimation  parameters,  is  to  hrst  hx  a  particular  channel  estimation 
scheme  (or  “front  end”),  and  then  maximize  the  information  theoretic  metric  of  interest 
(e.g.,  cutoff  rate  or  mutual  information)  over  the  relevant  system  design  parameters.  This 
simplifying  approach  is  used  in  both  [I4]  and  [26]  for  a  training-based,  minimum  mean 
square  error  (MMSE)  channel  estimation  front-end.  In  both  cases,  the  authors  have 
considered  the  mutual  information  metric. 

We  consider  a  periodic  training  scheme  as  in  [26].  Pilot  symbols  are  sent  periodically  to 
provide  (inaccurate)  estimates  of  the  flat-fading  channel  coefficient.  Knowledge  of  the 
channel  correlation  allows  us  to  predict  the  fading  channel  between  pilot  symbols.  We 
consider  three  MMSE  channel  predictors:  (a)  the  Q(i^o)  estimator  uses  only  the  last  pilot; 

(b)  the  Q(oo,o)  estimator  uses  all  of  the  past  pilots;  (c)  the  Q(i^i)  estimator  uses  the  last 
and  next  pilots  (non-causal) .  Using  the  cutoff  rate  as  a  metric,  we  address  the  following 
issues:  What  is  the  optimal  energy  allocation  between  data  and  training?  What  is  the 
optimal  training  frequency?  To  illustrate  the  approach,  we  focus  on  a  hrst-order 
Gauss-Markov  model  that  has  often  been  used  to  characterize  fading  channels.  We 
quantify  our  results  in  terms  of  an  a  parameter  that  measures  how  rapidly  the  channel  is 
fading  (a  =  0  corresponds  to  i.i.d.  fading,  and  a  =  1  to  a  static  channel).  To  gauge  the 
performance  of  our  techniques  in  real-world  scenarios,  we  also  consider  the  Jakes  model, 
and  also  the  impact  of  imperfect  knowledge  of  the  Doppler  spread  a  at  the  transmitter. 

Major  results  are  as  follows: 


(a)  Given  the  MMSE  estimation  front-end,  we  derive  an  expression  for  the  cutoff  rate  for 
generalized  binary  signaling  with  partial  GSI.  For  any  hxed  input,  the  cutoff  rate 
takes  on  a  simple  closed-form  expression  that  is  amenable  to  analysis;  it  is 
parameterized  by  the  mean  square  estimation  error  and  the  received  SNR. 

(b)  The  cutoff  rate  expression  holds  for  any  MMSE  estimator  that  forms  its  estimate 
based  on  any  subset  of  past  and  future  pilots.  We  provide  the  mean  square  error  for 
the  three  different  estimators  listed  previously. 

(c)  When  BPSK  is  used,  we  consider  the  Gauss-Markov  fading  channel  and  derive  exact 
expressions  for  the  optimal  allocation  of  energy  between  the  pilot  and  data 
transmission  for  the  Q(i,o)  Q(i4)  estimators.  For  each  of  the  three  estimators,  we 
End  a  lower  bound  on  the  optimal  training  frequency  that  is  exact  at  high  SNR. 

(d)  We  consider  the  scenario  where  the  transmitter  has  incorrect  knowledge  of  the 
Doppler  spread.  We  End  that  overestimating  the  Doppler  spread  (i.e.,  assuming  that 
the  channel  is  slower  varying  than  it  is)  results  in  a  drastic  degradation  of  the  cutoE 
rate.  Gonversely,  the  cutoE  rate  is  robust  to  underestimation  of  the  Doppler  spread. 
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(e)  We  confirm  that  our  analytic  insights  for  optimal  training  energy  and  training 
frequency,  based  on  the  Gauss-Markov  channel  model,  are  indeed  in  close  agreement 
to  those  provided  using  a  numerical  analysis  of  the  more  popular  Jakes  model. 

(f)  Among  binary  distributions,  On-Off  Keying  (OOK)  is  optimal  when  no  CSI  is 
available,  and  BPSK  is  optimal  when  full  CSI  is  present.  Motivated  by  this  fact,  we 
derive  an  analytic  design  rule  that  dictates  which  of  these  two  distributions  to  use 
when  only  partial  CSI  is  available  at  the  receiver. 


2.  System  Model 


In  this  section,  we  hrst  describe  the  Gauss-Markov  fading  channel.  Next,  we  describe  the 
estimation  scheme  and  derive  the  variance  of  three  different  MMSE  estimators  of  varying 
complexity. 

2.1  Gauss-Markov  Channel  Model 

We  consider  a  temporally  correlated  Rayleigh  flat-fading  channel  model.  Typically  the 
Jakes  model  [18]  is  used  to  describe  the  temporal  correlation  of  the  fading  process.  It  is 
known  that  second-  and  third-order  autoregressive  (Gauss-Markov)  models  provide 
excellent  hts  to  the  Jakes  model  [35].  The  higher-order  models  are  not  analytically 
tractable  and  so  do  not  provide  ready  insights.  We  consider  the  Gauss-Markov  fading 
channel  whose  correlation  function  is  described  by  a  hrst  order  autoregressive  process.  The 
Gauss-Markov  channel  has  previously  been  used  to  characterize  the  effect  of  imperfect 
channel  knowledge  on  the  performance  of  decision-feedback  equalization  [25],  mutual 
information  [27],  and  minimal  mean  square  estimator  error  [10]  of  time-correlated  faded 
communications  links.  Letting  the  subscript  k  denote  the  time  index,  the  observed  signal 
at  the  receiver  y'^  is  given  by 

y'k  =  V^kKsk  +  n[,  (1) 

K  =  (^K-i  +  Zk- 

where  h[  ~  CAf  (0,  a])  describes  Rayleigh  fading,  the  coded  input  Sk  is  selected  from  a 
binary  signal  set  S  =  {A,B}  (i.e.,  Sk  G  S)  and  subject  to  a  unit  average-energy  constraint: 
pA^  +  (1  —  p)B‘^  =  1,  where  p  is  the  probability  of  transmitting  A.  The  random  variable 
<  ~  CAf  (0,  a%)  describes  additive  white  Gaussian  noise  (AWGN),  and  the  transmission 
energy  used  at  time  k  is  Eklskl"^.  The  parameter  a  (0  <  a  <  1)  describes  the  correlation 
between  successive  channel  states  and  is  related  to  the  normalized  Doppler  spread  of  the 
channel.  The  Gauss-Markov  channel  is  equivalently  specihed  by  its  correlation  function 

nAr)  ^  [Kh',%]  =  «i-i, 

h 
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2.2  Channel  Estimation 


To  obtain  (imperfect)  channel  state  information  at  the  receiver,  we  assnme  that  pilot  (i.e., 
known)  symbols  are  periodically  inserted  into  the  symbol  stream.  Specihcally,  we  consider 
a  scheme  in  which  a  single  pilot  symbol  is  inserted  periodically  into  the  symbol  stream, 
with  period  T.  The  choice  of  periodic  pilot  placement  is  natnral  given  the  wide-sense 
stationarity  of  the  channel.  Motivation  for  inserting  a  single  pilot  at  a  time,  rather  than 
several,  may  be  fonnd  in  [2],  [10],  and  [8]. 

Training  is  sent  periodically  once  every  T  transmissions,  dnring  the  time  slots  k  =  mT.  At 
each  time  k,  an  estimate  of  the  channel  h'j.  is  made  at  the  receiver.  Denoting  the  estimation 
error  by  h[,  the  system  eqnation  can  be  written  as 

y'k  =  \/^\klT-\  Ksk  +  n'k 

=  \/ A'p/t]  h'kSk  +  -sj E^^k/T']  h'kSk  +  n'k,  (2) 

where  we  have  assnmed  that  the  energy  allocation  for  all  data  transmissions  Ei  is  constant 
dne  to  practical  constraints,  e.g.,  peak-to-average-power  ratio  specihcations  and  transmitter 
complexity  (we  consider  variable-energy  data  slots  in  section  5.)  The  inpnt  Sk  is  now 
selected  from  a  complex  signal  set  and  snbject  to  a  nnit  average-energy 

constraint:  p[k]Afk]  +  ~  P[k])^[k]  =  1  V/c  7^  mT.  In  the  training  slots,  we  assnme,  withont 

loss  of  estimator  performance,  that  Sq  =  {+!}•  Given  the  T-periodic  natnre  of  the 
estimation  process,  we  reqnire  that  codewords  of  length  N'  =  N(T—1),N  E  Z  he  nsed. 

The  particulars  of  the  estimation  process  follow:  At  each  time  mT  +  I,  an  MMSE  estimate 
of  the  channel  made  at  the  receiver  using  some  subset  M  of  past  (and  possibly 

future)  training  symbol  observations,  so  that 

^mT+e  —  ^[^mT+e\{ynT}j  U  E  M  <Z  Z].  (3) 

The  use  of  an  MMSE  estimator  implies  that  ~  CAf  where  aj  is  the  estimator 

variance.  From  orthogonality,  CM  (0,  al  -  a]).  That  is,  h'^^+e  and  h'^T+i  are 

independent.  To  characterize  the  performance  of  a  particular  estimator,  we  will  dehne  the 
estimator  quality  as 

a;,  ^  aj/al  (4) 

Note  that  orthogonality  implies  that  0  <  o;^  <  1.  Let 

Ko  =  crlEo/al  ;  =  crlEi/a^ 

denote  the  faded  pilot  and  data  energies.  We  consider  the  following  three  estimators 
(derivations  are  given  in  appendix  A  with  i?/i(r)  = 
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El.  The  0)  estimator  uses  only  the  most  recent  pilot  observation  to  predict  the 
subsequent  T  — 1  channel  states  before  the  next  pilot,  i.e.,  the  channel  estimate  £  positions 
after  the  most  recent  pilot  is  given  by  =  £  \h'mT+i\y'rnT\  ■  Evaluating,  we  End  that 


hi 


Eoal  +  cr^ 
,  ,(i>o)  _  2e  ^0 

UJn  —  (X 


2  ^  ymTt 


1  +  tto 


(5) 


E2.  The  Q(oo,o)  estimator  uses  all  past  pilots  to  predict  the  current  channel  state,  i.e.. 


h'  —  £ 

"'mT+t  —  ^ 


^mT+^l  {l/pr}p=_oo 


with 


(oo,0) 


=  a 


21 


I  (1  +  Kq)  +  +  KqY  + 

Additional  details  on  this  derivation  are  given  in  appendix  B. 


(6) 


E3.  The  estimator  is  a  non-causal  smoother  which  uses  the  last  and  “next”  pilot 

observations  to  predict  the  current  channel  state,  i.e.,  l/(m+i)T 

Evaluating,  we  End  that 

^mT+e  ~  yrtiT  +  ^  {T-l)  ?/(m+l)r]  ) 


N 


—  (r(£)  +  +  Kq)  +  2KqT (t}T , 


(7) 


where 

^  +  1)  ~  ct^ 

^  (Ko  +  1)2  -  Kl  ■ 

We  assume  that  perfect  interleaving  is  performed  at  the  transmitter  [4],  and  that  channel 
estimation  is  done  before  de-interleaving  at  the  receiver  (see  appendix  C  for  a  discussion  of 
interleaving).  The  observation  equation  is  now 


yk  —  V^khkSk  +  rik, 


where  hk  is  an  i.i.d.  sequence  representing  the  interleaved  channel,  and  where 

Uk  ~  CAf  (0,  a%)  is  another  AWGN  sequence.  Writing  this  in  terms  of  the  channel  estimate 

and  error,  we  have 

yk  =  \/Eik/T]  hkSk  +  nk 

=  \/Eik/T]  hkSk  +  sj E'i'fc/r]  hkSk  +  (8) 

where  hk,  and  hk  are  independent  of  each  other  and  are  independent  sequences  with  the 
same  marginal  statistics  as  in  equation  2.  That  is,  interleaving  does  not  change  the 
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marginal  statistics  of  the  channel  estimate  and  estimation  error:  hmT+e  r\j  CM  (0,  aj)  and 
hmT+e.  ~  CA/'(0,  a\  —  aj).  The  estimator  quality  is  dehned  by  equation  4  as  before.  Finally, 
we  assume  that  codewords  are  decoded  using  the  ML-detector  which  treats  SmT+e  as  the 
channel  input  and  the  pair  {ymT+e-i  hmT+e)  as  the  channel  output. 

In  section  3,  it  will  be  seen  that  the  cutoff  rate  Ro  is  an  (increasing)  function  of  uji. 
Therefore,  it  is  useful  to  compare  the  estimator  quality  expressions  in  equations  5  to  7,  as 
we  have  done  previously  in  [30]:  Note  that  the  estimator  quality  of  the  Q(i^o)  and  Q  (oo,0) 
estimators  decreases  monotonically  with  i,  i.e.,  distance  from  the  last  pilot;  the  quality  of 
the  estimator  is  symmetric,  with  worst  performance  midway  between  the  two  pilots. 

It  can  be  verihed  that  that  and  that  no  similar  inequality  can 

be  given  between  and  In  the  high  SNR  regime  {Kq  — )■  cx)),  — )■ 

This  is  because  the  channel  is  learnt  perfectly  in  each  pilot  slot,  and  so  additional  past 
pilots  do  not  improve  the  estimator.  For  a  rapidly  fading  channel,  as  a  — )■  0, 

— )■  since  it  is  the  most  recent  pilot  that  provides  most  of  the  information 

about  the  channel  state.  For  a  nearly  static  channel,  i.e.,  as  a  — )■  1,  This  is 

because  the  Q(oo,o)  estimator  provides  an  inhnite  number  of  noisy  looks  at  the  static 
channel,  whereas  the  Q(ip)  estimator  provides  only  two  noisy  looks.  Further, 

(^|oo,o)  ^  if  the  channel  is  varying  rapidly  or  the  SNR  is  large:  When  a  — )■  0,  it  is  the 
closest  pilots  that  contain  channel  information;  the  Q(i4)  estimator  provides  two  “close” 
pilots.  As  Kq  — )■  CX),  the  Q(oo,o)  estimator  converges  to  the  Q(i^o)  estimator,  which  is 
outperformed  by  the  Q(i4)  estimator.  Lastly,  we  note  that  all  three  estimators  become 
equivalent  for  high-SNR  static  channels,  i.e.,  as  a  — )■  1  and  Kq  — )■  oo. 


3.  Cutoff  Rate 


In  this  section  we  derive  the  cutoff  rate  for  general  binary  signaling  given  the  estimation 
front-end  described  in  section  2.  The  results  and  discussion  in  this  section  were  hrst  given 
(without  proofs)  in  [28]. 


The  cutoff  rate  for  the  system  described  by  equation  8  is  given  by 

n  2 

<3(s)\/P(y,h  I  s)  I  dhdy 

sG  tSi  X ... X tSj’—i 


Ro  =  -  min  1=  log2 

<3(-)  ^  JyJ\i 


1 


=  —  min  —  logo  Tr 

Q(.)  T  h 


where  h  = 


/s  S  <3(s)<3(v)\/p(y  I  h,  s)^Jp{y  I  h,  v)dy 

y  s,vG  (Si  X  ...X(St-i 

1  ^  A  t 

hmT+li  ■  ■  ■  1  h(^m+l)T-l  ,  y  =  [l/mT+1,  •  •  •  ,l/(m+l)T-l]  ,  ^ud 


(9) 
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s  —  [smT+1,  ■  ■  ■,  S(m+i)r-i]*  are  the  estimated  channel,  observation,  and  signal 
corresponding  to  the  T-length  “snper-symboh”  The  inpnt  distribntion 

T-l 

Q{s)  =  n  Qi{smT+e),  where  Qk{Ak)  =  Pk  and  Qk{Bk)  =  I  -  Pk- 

i=i 

In  appendix  D  it  is  shown  that  the  cntoff  rate  for  generalized  binary  transmission  becomes 
(setting  Ef  =  Ei  in  the  appendix) 


T-l 


Bo  = 


mm 


i=i 


l  +  2pi{l  -pi) 


PeAj+(l-pi,)Bj=l 

\/l  +  (1  —  uji)  +  Ati  (1  —  uji) 


1  +  I  (1  —  u;^)  +  I  tti  \Ai  —  B(\ 


-  1 


,  (10) 


where  tti  =  a^Ei/a^  was  previonsly  dehned  as  the  faded  SNR  dnring  data  transmissions. 
Next,  we  too  consider  two  important  special  case  inpnt  distribntions:  OOK  and  BPSK. 


3.1  On-Off  Keying 


It  is  shown  in  appendix  E,  that  when  =  0,  the  optimal  inpnt  for  the  slot  is  a  form  of 
OOK,  for  which  =  0  and  \Ai\^  =  l/pt  >  0.  For  general  tot  the  OOK  cntoff  rate  becomes 


Ro 


1 

f 


T-l 


e=i 


min 

PeA'j=l 


1  +  2pi{l  -pe) 


+  tti(l  —  u)ii)/pe,  _ 

1  +  I  tti  (2  —  io^jpi  j 


In  general,  it  is  not  possible  to  analytically  maximize  eqnation  10  over  p£,  as  it  leads  to 
solving  a  high-order  polynomial  that  has  no  explicit  solntion  as  a  fnnction  of  tti  and  cv^. 
However,  it  is  shown  in  appendix  E  that  as  tti  — )■  cx),  — )■  1/2,  and  that  as  Ki  — )■  0, 

— )■  0  (this  corresponds  to  no  information  transmission).  In  general  0  <  <1/2  (i.e.,  the 

probability  of  being  ‘OFF’  >  1/2).  In  hgnre  1,  we  plot  as  a  fnnction  of  Ki  for  several 
valnes  of  uj£.  We  see  that  for  moderate  to  large  valnes  of  K,  letting  p  =  1/2  is  a  reasonable 
approximation  to  p*.  In  the  seqnel,  we  let  p£  =  1/2,  for  which  the  cntoff  rate  becomes 


Ro 


1 

f 


T-l 

e=i 


1 

2 


1  s/l  2Hi{l  —  u}() 

2  1+  Ati  (1-  f) 


(11) 


To  gain  insight  into  the  behavior  of  eqnation  11,  we  plot  the  kA"  term  in  the  snm  above  in 
hgnre  2,  for  several  valnes  of  ujk-  As  tti  — )■  cx),  the  kA  term  approaches  1  regardless  of  ujk- 
Therefore,  as  tti  — )■  1,  Ro  — t  T/(T— 1)  for  any  valne  of  {wi, . . . ,  ujt-i}- 


Figure  1.  The  optimal  probability  of  transmitting  a  ‘1’  (yl),  p*  vs.  faded  SNR  K  (dB), 
for  different  values  of  the  estimator  quality  uj. 


Figure  2.  The  OOK  cutoff  rate  i?o,7r  vs.  faded  SNR  K  (dB),  for  different  values  of  the 
estimator  quality  cu. 

Since  Aj  =  —,  we  can  equivalently  repeat  hgure  1  in  terms  of  the  optimal  OOK  amplitude 
versus  faded  SNR  Ki  as  shown  in  hgure  3.  Note  that  as  Ki  increases,  the  optimal  OOK 
amplitude  decreases  and  (correspondingly)  increases.  This  trend  was  shown  in  [1]  for 
the  capacity  metric  and  for  no  CSI  =  0).  From  the  hgure,  we  see  that  when  >  0  this 
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Figure  3.  The  optimal  OOK  amplitude  vs.  faded  SNR  Ki  (dB),  for  different  values 
of  the  estimator  quality  uj. 

general  trend  is  still  true,  and  that  for  hxed  Ki,  the  optimal  amplitude  is  a  decreasing 

function  of  a;^.  As  Ki  becomes  large,  A|  — )■  -\/2,  and  correspondingly,  |  (which  is  the 

conventional  form  of  OOK).  In  section  3.3  it  is  noted  that  the  value  of  Ro  is  not  affected 
much  if  p£  =  1/2  is  used  in  place  of  the  optimal  value  of  p£.  Thus,  conventional  OOK  can 
be  used  without  sacrihcing  rate. 

3.2  BPSK 


It  can  be  verihed  that,  when  =  1,  the  optimal  input  in  equation  10  is  BPSK  [A^  =  —Bi 
and  Pi  =  1/2).  For  BPSK, 


Ro 


1 

T 


T-l 

i=l 


1 

2 


1  1 

2  1  +  K.I 


(12) 


To  gain  insight  into  the  behavior  of  equation  12,  we  plot  the  term  in  the  sum  above  in 
hgure  4,  for  several  values  of  ook-  We  make  the  following  observations: 


1.  The  estimator  quality  places  an  asymptotic  ceiling  on  Rg. 
rate  saturates  to 


Ro  = 


1 

T 


T-l 


£=1 


For  large  Ki  the  cutoff 
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Figure  4.  The  BPSK  cutoff  rate  Ro,b  vs.  faded  SNR  At  (dB),  for  different  values  of  the 
estimator  quality  uj. 

2.  When  o;^  =  0  (i.e.,  no  CSI  is  available),  information  transmission  is  not  possible.  This 
is  because  the  statistics  of  yk  at  the  receiver  are  independent  of  Sk]  i.e., 

yk\sk  =  1  ~  CAf  (O,  Eial  +  a%)  , 
yk\sk  =  -1  ~  CAf  (O,  Eial  +  cr^)  . 


3.3  Comparisons 

Here  we  compare  the  cutoff  rate  for  BPSK  to  that  of  OOK.  We  start  by  looking  at  the  two 
hypothesis  for  each  modulation  type.  For  BPSK,  the  statistics  of  yk{l  <  k  <  T—1)  under 
the  two  hypotheses,  conditioned  upon  the  known  part  of  the  channel  hk,  are 

Vkihk,  Sfe  =  1  ~  CM  {yM  hk,  EMl  +  , 

Sk  =  -I  ~  CM  y^/Mhk,  EMl  +  . 

The  ability  to  distinguish  between  the  two  hypotheses  is  only  through  the  difference  in  the 
means,  and  therefore  it  is  critical  that  >  0.  When  Ati  (i.e.,  when  EMl  ^  o'n), 

the  statistics  become 

Vklhk,  Sfc  =  1  ~  CM  hk,  EMl^  , 
yk\hk,Sk  =  -I  ~  CM  y\/Mhk,Eial^  . 
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Increasing  Hi  scales  the  variance  and  power  in  the  mean  eqnally,  and  so  for  large  /^i,  i.e., 

Hi  3>  performance  satnrates  as  depicted  in  hgnre  4. 

Note  that  the  OOK  cntoff  rate  is  non-zero  when  =  0  (see  hgnre  2),  which  is  not  the  case 
for  BPSK.  As  — )■  cx)  the  OOK  cntoff  rate  approaches  1  for  any  valne  of  oji.  The 
statistics  of  yk  nnder  the  two  hypotheses  are 

yk\hk,Sk  =  0  ~  CA/'(0,  (7%) 

Vk \hk:Sk  =  y/2  ~  CAf  (y^  hk,  ‘^Ei'^l  +  0-%^  . 

The  distance  between  the  means  is  obvionsly  rednced  compared  to  that  for  BPSK,  however 
the  variance  terms  are  now  distinct.  We  expect  that  if  the  difference  in  the  variance  terms 
is  large  enongh  (i.e.,  if  Hi  is  large  enongh),  then  OOK  will  be  able  to  ontperform  BPSK 
despite  the  decreased  separation  between  the  means.  Conversely,  for  small  Hi  (when  the 
variance  terms  are  nearly  identical),  we  expect  BPSK  to  ontperform  OOK.  The  general 
tradeoff  is  shown  in  hgnre  5  for  ooe  =  0.8.  In  the  hgnre,  we  also  plot  OOK  where  the 
optimnm  is  nsed  for  each  valne  of  Hi.  Note  that,  even  for  small  Hi,  the  diherence 
between  the  two  OOK  cnrves  is  small;  note  also  that  there  is  a  signihcant  gain  in  nsing 
OOK  instead  of  BPSK  at  moderate-to-large  Hi. 

In  snmmary,  we  hnd  that  OOK  is  preferred  to  BPSK  when  the  SNR  is  larger,  or  when  the 
estimation  qnality  ujf,  is  smaller.  We  note  that  BPSK  is  preferred  to  OOK  for  smaller  Hi 
and/or  for  better  (larger)  estimation  qnality.  This  qnalitative  analysis  is  qnantihed  in 
section  6. 


Fignre  5.  A  comparison  of  the  BPSK  cntoh  rate,  the  nnoptimized  OOK  cntoh  rate,  and 
the  optimized  OOK  cntoh  rate  vs.  Hi  (dB)  for  =  0.8. 
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4.  Optimized  Training  for  BPSK 

Here,  we  look  at  the  optimized  training  parameters  when  the  inpnt  is  BPSK.  We  stndy  the 
optimal  energy  allocation  {Kq,  kI)  and  optimal  training  period  T*  for  the  estimators 
E1-E3.  For  a  meaningfnl  analysis,  we  impose  an  average  energy  constraint, 

Kq  +  {T—1)k,i  =  K,a,vT  =  K,tot-  First,  we  hnd  (ttg,  K,l)  for  a  hxed  T.  Then,  we  consider  the 
optimal  valne  of  T.  We  consider  the  case  of  variable  energy  data  slots  in  section  5.  Many  of 
the  resnlts  in  sections  4.1  throngh  4.3  were  hrst  given  in  [30],  where  proofs  were  omitted 
dne  to  space  constraints. 

4.1  Optimal  Energy  Allocation 

For  the  Q(i^o)  estimator,  (from  which  follows  readily)  is  shown  in  appendix  F  to  be 

=  r  =  (13) 

for  T  >  2.  For  T  =  2,  =  ttav  Note  that  Kl  does  not  depend  on  a;  however  Ro{k,1) 

does.  In  the  low  energy  regime  (tttot  — t  0),  eqnation  13  leads  to  K.^  =  {Ka.v/2)T,  i.e.,  half  of 
the  available  energy  shonld  be  allocated  to  the  pilot  symbol.  The  50-percent  training 
paradigm  has  also  been  reported  in  [I4]  for  a  different  channel  model,  metric,  and 
assnmptions.  In  the  high  energy  regime,  {Ktot  — t  cxo),  we  hnd  that  ttg  =  KavT  • 

For  large  T,  the  energy  allocated  to  the  training  symbol  increases  as  ttavV^;  a  similar 
resnit  was  reported  in  [14]- 

For  the  Q(oo,o)  estimator,  Kq  is  shown  in  appendix  F  to  be  given  implicitly  by  (for  T  >  2) 

*  l^avT  Kq 

Hr.  =  arg  max  - — - - - - 

0<tto<t€av  T  [ttavT^  -  tto  +  (7^  -  1) 


/  2T 

tto  +  1  +  Y  (1  + 

This  implicit  sointion  provides  usefni  insights.  In  the  low  energy  regime,  eqnation  14  states 
that  ttg  =  (ttav/2)  T.  In  the  high  energy  regime,  =  K^vT  these  two  limiting 

cases,  the  Q(oo,o)  s-iiel  Q(i,o)  estimators  have  the  same  optimal  energy  allocation,  which  is 
independent  of  a.  In  general,  Kq  decreases  as  a  (which  is  a  measnre  of  channel 
predictability)  increases:  As  a  — )■  1,  ttg  — )■  0.  This  is  becanse,  the  Q(oo,o)  estimator  provides 
ns  with  an  inhnite  nnmber  of  (noisy)  observations  of  the  nearly  time-invariant  channel. 
Each  observation  reqnires  only  a  minnscule  amonnt  of  energy,  in  order  to  make  nse  of  the 
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infinite  diversity  gain.  As  a  — )■  0,  Kq  converges  to  the  of  eqnation  13:  for  a  rapidly 
fading  channel,  the  most  recent  pilot  provides  all  the  information  abont  the  channel.  In 
general,  it  is  shown  in  appendix  F  that  ;  the  estimator  of  higher  quality 

requires  less  training  energy. 


The  energy  allocation  rules  of  equations  13  and  14  can  be  extended  in  a  straightforward 
way  to  any  causal  estimator  (i.e.,  any  estimator  of  the  form  given  in  equation  3,  where 
max  {A/"}  <  m).  For  any  causal  estimator  with  estimator  quality  coi,  Kq  is  given  implicitly 
in  terms  of  the  estimation  quality  in  the  pilot  slot  ujq  as  follows: 


Hfk  =  arg  max 

0<tto<t€av  T 


ttavT  -  hio  +  {T  -1) 


u}q  (tto,  01,  T,M) 


where  the  notation  emphasizes  that  uiq  is  a  function  of  Hq,  a,  T,  and  M.  The  proof  follows 
easily  using  the  methodology  for  the  Q(oo,o)  estimator,  and  is  a  consequence  of  the  fact 
that,  for  any  causal  estimator, 


Finally,  we  give  Kq  for  the  estimator.  For  simplicity,  we  consider  the  case  where 
T  >  2.  Because  no  closed  form  expression  for  Kq  exists  in  general,  we  will  focus  on  the  low 
energy,  high  energy,  rapidly  fading  (a  1),  and  slowly  fading  {a  — )■  1)  regimes.  The 
optimal  training  energy  in  the  low  SNR  regime  can  again  be  shown  to  be  given  by 
Kq  =  (ttav/2)T.  In  the  high  SNR  regime,  it  is  shown  in  appendix  F  that  Kq  ~  KavVT  with 
equality  when  T  becomes  large.  For  rapid  fading  a  1,  it  can  be  seen  that  the  optimal 
training  energy  converges  to  that  of  the  Q(i^o)  estimator,  i.e.,  — )■  as  is  to  be 

expected.  For  slow  fading  a  1,  it  can  be  shown  that  =  —p  +  with 

p  =  figure  15,  we  plot  the  optimal  training  energy  for  the  limiting  cases  of 

rapid  fading  and  slow  fading  for  T  =  8.  The  slowly  fading  channel  makes  use  of  about 
10-percent  more  training  energy  compared  to  the  fast  fading  channel.  This  percentage 
decreases  for  smaller  values  of  tttot-  We  note  that  ttg  i.e.,  the  better  estimator 

requires  less  training  energy.  The  optimal  training  energy  for  each  of  the  estimators  is 
summarized  in  table  1. 


4.2  Optimal  Training  Period 

The  preceding  analysis  gives  insights  into  the  optimal  energy  allocation  (ttg,  for  a  fixed 
training  period  T.  Below,  we  consider  the  optimal  training  period  T*  for  each  estimator. 

First  we  consider  the  causal  estimators.  It  is  shown  in  appendix  G  that  for  all  casual 
estimators,  a  lower  bound  on  the  optimal  value  of  the  training  period  Tb  can  be  found  by 
considering  the  high  energy  regime  (ttav  — t  cxo)  (this  is  equivalent  to  assuming  that  the 
channel  is  known  perfectly  in  the  relevant  training  slots).  Furthermore,  the  lower  bound 
Tb  is  exact  at  high  SNR,  and  is  the  same  for  all  casual  estimators,  depends  only  on  a  and 
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Table  1.  The  optimal  training  energy  for  the  Q(i,o))  Q(oo,o)  Q(i,i)  estimators  for 
high  SNR  (t^av  — t  cxo),  low  SNR  (t^av  — t  0),  rapid  fading  (a  1),  and  slow 
fading  (a  ^  1). 
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0 
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is  given  by: 


T-l 


Tb  =  argmin 


1  - 


^2{nT+e) 


1/T 


(15) 


e=i  -■ 

where  n,  0  <  n  <  oo,  is  the  nnmber  of  pilots  between  the  most  recent  pilot  and  the  last 
pilot  nsed.  For  any  practical  scheme,  n  =  0.  For  the  Q(i,o)  ^^el  Q(oo,o)  estimators,  n  =  0, 
and  the  bonnd  becomes 


^B,(1,0)  —  Tb,{oo,0) 


T-l 

=  arg  min 

£=1 


1  - 


a 


21 


-I  l/T 


(16) 


Althongh  the  high-SNR  bonnds  are  eqnal,  we  hnd  that,  in  general,  for  the 

eqnal  energy  case  {Kq  =  Hi  =  ttav)-  Fnrthermore,  it  is  shown  in  appendix  G  that 
T*^  >  T*^  for  any  channel  with  monotonically  decreasing  correlation  fnnction 

We  see  that  the  better  estimation  scheme  reqnires  more  frequent  pilots.  This  was  noted 
henristically  in  [26].  Here,  it  has  been  proven  analytically. 


For  the  Q(i4)  estimator,  the  lower  bonnd  is 


T-l 

TB,ii,i)  =  argmm  JJ 
e=i 


1  -  i  +  a2(T-b) 


1  l/T 


a 


2T 


(17) 


We  find  that  Tb,{i.i)  >  ^  Tb,{i,b)  and  that  Tj',  q,  <  ,,1  Unlike  the  Q(i,o)  vs. 

Q(oo,o)  case,  the  better  estimation  scheme  reqnires  less  freqnent  training.  This  apparent 
conflict  is  resolved  by  noting  that  the  training  period  is  not  determined  by  the  qnality  uji  of 
the  estimation  scheme,  bnt  rather,  by  how  qnickly  “falls  off’  as  f  is  increased.  Table  2 
compares  Tb  to  T*  for  each  estimator,  for  several  valnes  of  ttav  and  a.  In  general,  the 
bonnd  is  accnrate  for  smaller  valnes  of  a  and  larger  valnes  of  ttav  From  the  table,  when 

'I'This  can  be  proven  for  the  equal  energy  case.  We  conjecture  that  it  is  also  true  when  we  optimize  over 
the  training  and  data  energy.  The  conjecture  is  also  supported  by  numerical  evidence. 
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Table  2.  Comparison  of  the  optimal  training  period  T*  ^  to  the  lower  bound  for 

several  different  values  of  a  and  ttav 
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^S,(oo,0) 
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3 
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4 
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8 
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5 
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7 
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5 
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7 

7 
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20 

11 
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29 

15 
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9 

17 

15 
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9 

9 

9 

15 
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a  =  0.80,  the  lower  bound  is  exact  for  all  ttav  >  1-  For  Oi  =  0.95,  the  lower  bound  is  exact 
for  t^av  >  10.  For  a  =  0.99,  the  lower  bound  is  tight  for  >  10,  and  exact  for  tCav  >  100. 

4.3  Cutoff  Rate  with  Optimized  Training 

First,  we  analyze  how  using  each  of  the  three  proposed  estimators  affects  the  unoptimized 
cutoff  rate;  see  hgure  6.  We  use  the  same  energy  in  each  transmission  slot,  and  pick  a 
hxed,  but  reasonable,  value  for  the  training  period  (T  =  10)  based  on  table  2,  with 
a  =  0.99.  At  low  SNR,  there  is  a  3  dB  gain  in  the  cutoff  rate  for  the  Q(oo,o)  estimator  over 
the  Q(i^o)  estimator.  As  expected,  this  gain  diminishes  as  SNR  is  increased.  Also  as 
expected,  the  Q(i^i)  estimator  outperforms  the  other  two  at  high  SNR.  The  gain  in  using 
the  Q(i  1)  estimator  over  the  other  two  is  as  much  as  3  dB  at  low  to  moderate  SNR.  As 
SNR  — )■  oo,  the  cutoff  rate  for  the  Q(i4)  estimator  saturates  to  0.8514;  a  value  that  exceeds 
the  saturation  cutoff  rate  of  0.7829  for  either  of  the  other  two  estimators. 

Next,  we  analyze  how  using  each  of  three  proposed  estimators  affects  the  optimized  cutoff 
rate.  In  hgure  7,  we  plot  the  cutoff  rate,  optimized  over  the  energy  allocation  {Kq,  Ki)  and 
training  period  T.  Note  that  optimizing  the  cutoff  rate  effectively  narrows  the  gain  of  the 
more  complicated  estimators  over  the  Q(i^o)  estimator.  In  particular,  the  Q(i^o)  Q(oo,o) 
estimators  perform  within  1  dB  of  each  other.  At  high  SNR,  the  optimized  energy 
allocation  provides  no  gain;  even  a  “sloppy”  energy  allocation  will  allow  the  cutoff  rate  to 
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Figure  6.  The  cutoff  rate  for  three  different  estimators  for  T  =  10  for  equi-energy  trans¬ 
mission  siots,  and  for  a.  =  0.99. 


Figure  7.  The  cutoff  rate  for  three  different  training  estimators,  optimized  over  the  train¬ 
ing  period  T  and  energy  Kq  for  a  =  0.99. 

saturate  to  its  maximum  value.  A  poor  choice  of  T  will  result  in  a  large  loss  of  the  cutoff 
rate  relative  to  the  optimized  value,  even  at  high  SNR. 
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Next,  we  fix  the  particular  estimator  used,  and  by  comparing  figures  6  and  7,  determine 
the  gain  in  cutoff  rate  attained  by  using  optimized  training  parameters  in  place  of  the 
unoptimized  (but  reasonable)  parameters.  For  the  Q(i^o)  estimator,  the  gain  is  typically 
between  3  ~  4  dB  at  low  SNR.  For  the  Q(i  i)  estimator,  the  gain  is  typically  2  ~  3  dB.  For 
the  Q(oo,o)  estimator,  the  gain  is  typically  ~  0.5  dB.  Note  that  the  gain  in  using  optimized 
parameters  diminishes  as  the  estimation  scheme  uses  more  pilot  symbols.  This  suggests 
that,  as  more  pilot  observations  are  exploited,  the  less  the  cutoff  rate  benefits  from  an 
optimized  energy  allocation.  We  emphasize  that  the  gain  in  the  optimized  cutoff  rate 
would  have  been  even  more  dramatic,  had  a  “poor”  value  of  T  been  chosen. 

4.4  Mismatched  Doppler  Spread 

Previously,  we  have  assumed  that  the  Doppler  spread  a  is  known  perfectly  at  the 
transmitter.  Here,  we  study  how  the  cutoff  rate  is  impacted  when  the  transmitter  has 
inaccurate  knowledge  of  the  Doppler  spread.  Given  imperfect  knowledge  of  the  Doppler 
spread,  the  transmitter  will,  in  general,  incorrectly  determine  the  energy  allocation  and  the 
training  frequency.  In  the  following  analysis,  we  assume  that  the  receiver  has  perfect 
knowledge  of  the  Doppler  spread,  and  therefore,  that  channel  prediction  is  carried  out 
without  error.  Let  5  =  (1  +  S)a  denote  the  transmitter’s  assumed  value  of  the  Doppler,  so 
that  S  denotes  the  relative  error.  Note  that,  if  the  transmitter  adapts  its  rate  based  on  its 
perceived  value  of  a,  it  will  transmit  at  rates  larger  than  the  cutoff  rate  if  5  >  0.  This 
cannot  be  allowed,  for  these  rates  may  not  be  supported  by  the  channel.  Hence,  in  this 
section  only,  we  assume  that  the  transmitter  uses  a  fixed  transmission  rate  sufficiently 
smaller  than  the  cutoff  rate  of  the  channel.  The  cutoff  rate  discussed  in  this  section  should 
be  interpreted  as  a  bound  on  the  probability  of  iV-length  block  decoding  error,  given  by  the 
expression  P^,  <  for  transmission  rates  R  <  Ro- 

For  the  Q(i,o)  estimator,  does  not  explicitly  depend  on  a.  However,  incorrect  knowledge 
of  a  at  the  transmitter  will  result  in  an  incorrect  assignment  of  T.  If  T  is  assigned 
incorrectly,  then  energy  will  also  be  allocated  sub-optimally.  In  hgure  8,  we  plot  the 
normalized  cutoff  rate  i?o(5)/i?o(«)  for  —0.4  <  S  <  0.5  (corresponding  to 
0.57  <d<  0.998)  for  a  =  0.95  and  ttav  =  100.  Also  included  on  the  hgure  is  the  training 
period  selected  by  the  transmitter  T  as  a  function  of  S.  As  expected,  the  normalized  cutoff 
rate  ~  1  when  d  ~  0.95.  From  the  hgure,  the  degradation  in  the  cutoff  rate  is  less  than 
25-percent  even  when  a  is  underestimated  by  40-percent.  When  a  is  underestimated  by  up 
to  5-percent,  there  is  virtually  no  loss  in  the  cutoh  rate.  Conversely,  if  a  is  instead 
overestimated  by  5-percent,  there  is  a  drastic  loss  in  the  cutoff  rate,  more  than  35-percent. 
We  see  that  it  is  better  to  underestimate  a  rather  than  to  overestimate  it.  This  is  because, 
T*  changes  more  rapidly  as  a  is  increased  than  when  it  is  decreased  (e.g.,  see  table  2). 
When  a  is  overestimated,  T  deviates  quickly  from  T* .  When  a  is  underestimated,  T 
deviates  less  quickly  from  T*.  This  behavior  is  emphasized  as  a  itself  becomes  large.  Next, 
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Figure  8.  The  normalized  cutoff  rate  Ro{ol) / Ro{ol)  and  corresponding  training  period 
T  for  Doppler  mismatch  in  the  range  —0.4  <  S  <  0.5,  when  a  =  0.95  and 
ttav  =  100  for  the  Q(i  o)  estimator. 

we  make  two  points  from  hgure  8:  (1)  Roia)  corresponds  to  the  perfect-a  cutoff  rate,  if  T 
were  used  in  place  of  T*,  (2)  Ro{a)  changes  in  “discrete”  steps  (corresponding  to  changes 
in  T  as  5  is  varied.  These  last  two  properties  are  properties  of  the  Q(i,o)  estimator,  and  are 
a  consequence  of  the  fact  that  does  not  depend  explicitly  on  a.  For  the  other 
estimators,  Kq  depends  explicitly  on  a  and  so  the  cutoff  rate  will  decline  continuously  as  a 
deviates  from  a  (as  will  be  seen  in  the  sequel).  We  note  that  the  same  general  trend  holds 
as  Kav  is  varied,  as  can  be  seen  in  hgures  9  and  10  where  we  repeat  the  same  analysis,  this 
time  for  =  10  and  K^v  =  0.1  respectively.  Considering  hgure  10,  we  see  that  the 
disparity  is  even  larger  at  small  SNR;  overestimating  a  by  5-percent  degrades  the  cutoff 
rate  by  about  60-percent,  whereas  underestimating  a  by  the  same  percentage  results  in  a 
loss  of  less  than  10-percent.  We  observe  that,  for  a  hxed  value  of  S,  the  normalized  cutoff 
rate  decreases  with  decreasing  SNR. 

Here,  we  repeat  the  preceding  analysis  for  the  Q(oo,o)  estimator.  In  hgure  11,  we  plot  the 
normalized  cutoh  rate  under  the  same  parameters  as  in  the  previous  case,  for  Kav  =  10. 
Again,  we  note  that  it  is  better  to  underestimate  a  rather  than  to  overestimate  it.  We  see 
that  in  practice  the  normalized  cutoh  rate  still  changes  in  steps  that  are  nearly  discrete 
(which  correspond  to  incorrect  assignments  of  the  training  period  at  the  transmitter)  for 
the  Q(oo,o)  estimator.  This  implies  that  the  degradation  in  the  cutoh  rate  incurred  when 
using  a  mismatched  value  of  the  Doppler  spread  is  primarily  through  an  incorrect 
assignment  of  T  at  the  transmitter,  and  is  only  slightly  ahected  by  the  incorrect  allocation 
of  energy  (/^o,  tti).  The  normalized  cutoh  rate  is  slightly  larger  for  the  Q(oo,o)  estimator 
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than  for  the  q)  estimator,  as  seen  by  comparing  hgnres  9  and  11.  This  means  that  the 
Q(oo,o)  estimator  is  less  sensitive  to  a  mismatched  Doppler  parameter  than  is  the  Q(i^o) 
estimator.  We  will  say  more  on  this  point  in  what  follows. 


Fignre  9.  The  normalized  cntoff  rate  Ro{a) / Ro{ct)  and  corresponding  training  period 
T  for  Doppler  mismatch  in  the  range  —0.4  <  S  <  0.5,  when  a  =  0.95  and 
Atav  =  10  for  the  Q(i,o)  estimator. 


Fignre  10.  The  normalized  cntoff  rate  Ro{a)/Ro{a)  and  corresponding  training  period 
T  for  Doppler  mismatch  in  the  range  —0.4  <  S  <  0.5,  when  a  =  0.95  and 
Kav  =  0.1  for  the  Q(i,o)  estimator. 
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Figure  11.  The  normalized  cutoff  rate  Ro{ct)/Ro{a)  and  corresponding  training  period 
T  for  Doppler  mismatch  in  the  range  —0.4  <  S  <  0.5,  when  a  =  0.95  and 
ttav  =  10  for  the  Q(oo,o)  estimator. 

Next,  hgure  12  considers  the  estimator  for  the  same  parameters  as  in  the  previous 

two  cases  with  =  10.  Again,  we  notice  that  it  is  better  to  underestimate  a  than  to 
overestimate  it  and  that  the  normalized  cutoff  rate  changes  in  what  are  effectively  discrete 
steps.  In  hgure  13  we  superimpose  the  normalized  cutoff  rates  for  all  three  estimators  when 
Kav  =  10;  hgure  14  shows  the  corresponding  training  period  determined  by  the  transmitter 
T.  We  note  that  in  general,  any  estimator  may  be  the  most  (or  least)  sensitive  to  a 
Doppler  mismatch,  depending  on  the  particular  parameters  chosen. 

In  general  (i.e.,  over  a  wide  variety  of  simulation  parameters)  we  have  observed  the 
following  trends:  (1)  We  continue  to  hnd  that  it  is  better  to  underestimate  a  than  to 
overestimate  it.  (2)  We  hnd  that  the  normalized  cutoh  rate  Ro{ct) / Ro{ci)  always  changes  in 
(nearly)  discrete  steps,  for  each  of  the  three  estimators.  This  implies  that  when  using  a 
mismatched  value  of  a  at  the  transmitter,  the  degradation  is  due  primarily  to  choosing  an 
incorrect  value  of  T  and  that  the  subsequent  misallocation  of  energy  has  an  insignihcant 
ehect  on  the  cutoh  rate.  The  issue  of  how  much  the  cutoh  rate  is  degraded  due  to 
mismatched  a  consists  of  two  parts:  (1)  how  quickly  does  T  change  with  a/a7  (2)  how 
quickly  does  the  cutoh  rate  degrade  with  T  for  a  given  estimator?  These  two  issues  must 
be  considered  jointly.  For  example,  consider  the  initial  set  of  parameters  (where  Kav  =  10). 
From  table  2  or  hgure  14,  we  see  that  T*^  changes  most  rapidly  with  a  and  is  followed  by 
and  lastly  by  T*^  However,  from  hgure  13,  it  is  evident  that  the  cutoh  rate  of  the 
Q(i  estimator  changes  least  rapidly  with  T  over  a  wide  range  of  6  (e.g.,  the  range 
-0.16  <  5  <  0),  so  that  the  estimator  is  the  least  sensitive  to  Doppler  mismatch  over 

this  range. 
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Figure  12.  The  normalized  cutoff  rate  Ro{a)/Ro{a)  and  corresponding  training  period 
T  for  Doppler  mismatch  in  the  range  —0.4  <  S  <  0.5,  when  a  =  0.95  and 
ttav  =  10  for  the  Q(i4)  estimator. 


Figure  13.  The  normalized  cutoff  rate  Ro{ol) / Ro{oi)  for  Doppler  mismatch  in  the  range 
—0.4  <  S  <  0.5,  when  a  =  0.95  and  =  10  for  all  estimators. 
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Figure  14.  The  training  period  determined  by  the  transmitter  T  for  Doppler  mismatch 
in  the  range  —0.4  <  S  <  0.5,  when  a  =  0.95  and  ttav  =  10  for  all  estimators. 
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Figure  15.  The  optimal  percentage  of  training  energy  for  the  Q(i^i)  estimator,  tT:o,(i,i)/tttot, 
for  the  limiting  cases  of  a  rapidly  varying  [a  -C  1)  and  slowly  varying  {a  ~  1) 
channel  for  T  =  8. 
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5.  Variable  Energy  Data  Slots 


Here,  we  generalize  the  energy  allocation  problem  of  section  4  to  the  case  where  the 
snb-channel  is  allocated  an  arbitrary  energy  We  impose  a  total  energy  constraint 

—  ^tot  (where  Ki>0),  and  seek  to  hnd  the  energy  allocation  K,  =  {Kq,  . . . ,  Kt-i} 
that  maximizes  the  cntoff  rate.  We  treat  the  training  period  T  as  a  hxed  parameter,  and 
consider  the  Q(i^o)  estimator.  We  hrst  presented  the  results  in  sections  5.1  through  5.3 
below  in  [28],  where  proofs  were  omitted  due  to  space  limitations. 


For  variable  energy  data  slots,  the  observation  equation  in  equation  8  now  becomes 

Vk  ^k^k  \/ ^[k]  E  Tlk, 


and  the  corresponding  cutoff  rate  is  (see  appendix  D.) 


In  seeking  the  optimal  energy  allocation,  our  intuition  from  water-hlling  over  parallel 
AWGN  channels  applies,  as  interleaving  removes  the  correlation  between  the  T 
sub-channels  with  respect  to  coding  (the  correlation  is  exploited  instead  in  the  estimator 
design).  Water-hlling  predicts  that  more  energy  will  be  allocated  to  less  noisy  channels, 
and  that  channels  with  noise-levels  above  a  threshold  will  not  be  used  at  all.  We  will  see 
that  both  of  these  ideas  are  preserved. 


1  (1 

D _ 1  k 


1  +  —  co^) 

1  + 


5.1  Substitution  Function 


Optimization  of  Ro  over  K  does  not  lead  to  a  closed  form  solution  for  the  optimal  energy 
allocation.  Hence,  we  propose  an  approximate  solution  based  on  optimizing  the 
substitution  function 


T-l 


a 


11 


(.=1 


(1  -|-  Kq)  (1  -|-  Kl) 


(19) 


over  Av.  We  will  denote  the  optimizer  of  the  substitution  function  by  At  .  Let  At*  be  the 
optimal  energy- vector  for  Ro  in  equation  18.  We  claim  that  At*  ^  At*  for  the  following 
reasons  (proofs  and  further  details  are  given  in  appendix  H): 


Al.  The  approximate  solution  is  exact  (i.e..  At*  =  At*) 
as  a  — )■  1  or  as  a  — )■  0  or  as  Attot  — t  0. 

A2.  The  appropriate  Taylor  expansion  shows  that  At*  ^  At*  if  1  or  if  Ato  1  or  if 

Ki  <  1,V£>  1. 
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A3.  Numerical  simulations  show  that  Av*  ^  At*  for  moderate  values  of  a,  at  moderate  to 
high  values  of  Attot  (this  is  the  region  where  no  theoretical  justihcation  has  been 
given) . 


Illustrative  examples  of  the  above  remarks  are  given  in  the  sequel. 


5.2  Optimal  Energy  Allocation 


The  optimal  energy  vector  K*  is  specihed  by  the  following: 
Theorem,  (a)  Use  the  hrst  M  data  slots  (T4  =  M)  iff 


1)  <  Attot  < 


i-8{M  -T  +  iy 


where  5{x)  is  the  Kronecker  delta,  1  <  M  <  T  —  1,  and 


a 


1  —  a 


^  q  /I 

2  V  4  1-a^ 


(b)  The  optimal  training  energy  (Ta  7^  1)  is  given  by: 


(20) 


i^o{Ta)  —  {Ta  +  tttot)  +  +  A)  (Ta  +  tttot)  ~  (A  +  1)  (Ta  +  tCtot), 

where  A  =  1  (hi^)(i+^ji)_ 

i  a— a  A 

(c)  The  data  energies  {Ta  7^  1)  are  given  by,  1  <  £  <  Ta, 

i^e  =  - jr-  [Attot  —  i^o{Ta)  +  Ta]  —  1. 

(d)  If  =  1,  Ato  =  Ati  =  Attot/2. 


Proof.  See  appendix  I. 


The  channel  assignment  strategy  of  equation  20  is  illustrated  in  hgure  16  for  a  system  with 
tttot  =  50,  T  =  21,  and  for  several  values  of  a.  Consider  the  curve  (j),s(^M).  The  candidate 
energy  line  intersects  4>,yM)  between  M  =  10  and  M  =  11.  Therefore,  Ta  =  11  is  the 
optimum  number  of  data  slots  to  activate. 
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Figure  16.  An  illustration  of  the  test  for  determining  Ta- 


We  look  at  some  consequences  of  the  Theorem: 


1.  Ta  'is,  an  increasing  function  of  a  (see  hgure  4).  This  can  be  verihed  by  noting  that 


da 


<0,  for  0  <  a  <  1. 


2.  As  /^tot  — oo,  all  T  —  1  slots  become  active  and 

- 1 


—  1)  —  Attot 


h[a)  —  1 


where  hi  a')  -  (i+A(i-«'^  b 
wnere  nya)  —  ■ 

3.  As  /^tot  — t  0,  only  the  hrst  data  slot  is  active  and 

«o(i)  = 


5.3  Numerical  Simulations 


In  this  section,  we  show  that  hi* 
error  metric^ 


K*  using  numerical  techniques.  Dehne  the  normalized 
'In  -  kill 


A 

e  = 


tttot 


tHere  ||  a||i  denotes  the  1-norm  of  the  vector  a. 
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Figure  17  compares  hi*  with  H*  for  Ktot  =  10  and  T  =  6,  and  for  several  values  of  a. 
Remarks  Al  and  A2  predict  that  the  approximate  solution  performs  well  for  a  =  0.2  and 
a  =  0.98.  This  is  verihed  in  the  hgure,  both  graphically  and  from  the  e  metric.  We  observe 
that  the  solution  is  also  close  for  a  =  0.5  and  a  =  0.7.  Note  that  in  all  cases  the 
approximate  solution  correctly  predicts  the  number  of  active  slots  Ta- 

In  hgure  18,  we  compare  H*  and  H*  for  a  =  0.85,  T  =  6,  and  for  different  values  of  /^tot- 
Remarks  Al  and  A2  predict  accuracy  for  tttot  =  0.1.  We  see  that  the  normalized  error  e 
remains  small  for  the  higher  values  of  Ktot  as  well.  Again,  the  approximate  solution 
correctly  predicts  Ta  in  each  case. 
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Figure  17.  Comparison  of  K*  with  K*  for  Attot  =  10  and  different  values  of  a.  The  x-axis 
indicates  each  of  the  T  slots.  The  y-axis  shows  the  energy  placed  in  each  of 
the  slots. 

5.4  Effect  Upon  Cutoff  Rate 

In  this  section  we  assess  the  value  of  variable  data  energy  allocation  over  other  energy 
allocation  strategies  using  the  cutoff  rate  metric.  In  hgure  19,  we  plot:  (a)  R{o,a),  the  cutoh 
rate  when  all  transmission  slots  use  the  same  energy,  so  that  /^o  =  tti  =  (b)  R{o,b),  the 

cutoh  rate  for  equal  energy  data  slots,  Kq  is  determined  from  equation  13;  (c)  R(o,c),  the 
variable  data  energy  cutoh  rate  using  the  approximate  optimal  energy  distribution  given  by 
the  substitution  function  equation  20,  and  (d)  R(o,d),  the  variable  data  energy  cutoh  rate 
using  the  true  optimal  energy  vector  determined  numerically.  The  simulation  is  for 
a  =  0.98,  and  in  all  cases,  the  optimal  value  of  the  training  period  T  is  used. 
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Figure  18.  Comparison  of  hi*  with  H*  for  a  =  0.85  and  different  values  of  Utot- 


Figure  19.  The  cutoff  rate  for  four  different  energy  allocation  strategies:  R{o,a)  is  the 
equal-energy  cutoff  rate,  R{o,b)  is  the  two-dimensional  cutoff  rate,  R{o,c)  is 
the  variable  energy  cutoff  rate  using  the  substitution  function,  and  R{o,d)  is 
the  variable  energy  cutoff  rate  using  numerical  optimization.  The  Doppler 
parameter  a  =  0.98,  and  the  optimal  value  of  T  is  used  in  all  cases. 
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From  figure  19  we  make  two  key  observations.  First,  there  is  about  a  2  dB  gain  in  going 
from  R(o,A)  to  R{o,b)  at  low  to  moderate  values  of  ttav  That  is,  two  dimensional  energy 
optimization  results  in  signihcant  energy  savings  over  an  equal  energy  strategy.  This  gain 
increases  for  larger  values  of  a  (slowly  fading  channels)  and  diminishes  for  smaller  values  of 
a  (rapidly  fading  channels).  As  ttav  — t  oo,  this  gain  diminishes  to  zero.  This  is  because,  in 
the  high  energy  scenario,  even  a  sloppy  allocation  of  energy  will  be  sufficient  for  the  cutoff 
rate  to  saturate.  Second,  we  note  that  the  gain  in  R(o,c)  or  R{o,d)  over  R{o,b)  is  negligible. 
This  means  that  there  is  practically  no  benefit  in  doing  variable  data  energy  allocation  in 
place  of  the  fixed  data  energy  allocation.  This  result  holds  true  for  all  tested  values  of  a 
and  Kav 

We  also  note  that  these  results  hold  true  only  if  the  optimal  value  of  T  can  be  chosen.  If 
the  transmitter  is  unable  to  choose  the  optimal  value  of  T,  the  more  sophisticated  energy 
allocation  strategies  do  indeed  provide  gains  as  illustrated  in  hgure  20.  Here,  we  let 
a  =  0.88  and  £x  T  =  15  for  all  strategies.  Note  that  there  is  as  much  as  an  additional  2  dB 
gain  in  going  from  R{o,b)  to  R{o,c)  (or  R(o,d))-  This  is  because,  in  this  example,  channel 
predictability  is  poor.  Scheme  ‘C’  (or  equivalently  ‘D’)  allows  the  later  data  slots  to  be 
“turned  off’  since  the  channel  will  not  be  predicted  accurately  in  those  slots.  The  energy 
saved  is  used  instead  in  the  earlier  data  slots  in  which  the  channel  is  predicted  accurately. 
Schemes  ‘A’  and  ‘B’  do  not  have  this  freedom;  they  are  forced  to  allocate  the  same  energy 
to  all  data  slots.  Also,  note  that  R{o,c)  ~  R{o,d)-  That  is,  the  energy  allocation  derived 
from  the  substitution  function  results  in  a  cutoff  rate  that  is  practically  the  same  as  if  the 
exact  optimal  energy  vector  had  been  used. 


6.  BPSK  and  OOK  Hybrid  Modulation 


We  ask  the  following  question:  “Given  partial  CSI  ujg  at  the  receiver,  what  is  the  optimal 
binary  input  distribution  in  each  data  slot?”  In  light  of  the  discussion  in  section  3.3,  we 
provide  a  partial  answer  by  conhning  our  interest  to  BPSK  (optimal  for  full  CSI)  and  OOK 
(optimal  for  no  CSI).  We  will  consider  the  form  of  OOK  for  which  pg  =  1/2.  The  discussion 
and  results  in  this  section  are  based  largely  on  our  results  in  [29j,  where  proofs  had  been 
omitted  due  to  lack  of  space.  Next,  we  provide  the  transitional  value  of  the  faded  SNR 
Kg  =  g(ujg),  above  which  OOK  is  optimal,  and  below  which  BPSK  is  optimal: 

Design  Rule  The  transitional  faded  SNR  Kg  for  the  path  is  found  by  equating 
equation  11  with  12  and  solving  for  Kg.  This  yields  a  third-order  polynomial.  Retaining 
the  relevant  root  yields  the  transitional  SNR  in  the  path  to  be: 

htg  = 
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Figure  20.  The  cutoff  rate  for  four  different  energy  allocation  strategies:  R{o,a)  is  the 
equal-energy  cutoff  rate,  R(o,b)  is  the  two-dimensional  cutoff  rate,  R{o,c)  is 
the  variable  energy  cutoff  rate  using  the  substitution  function,  and  R(o,d)  is 
the  variable  energy  cutoff  rate  using  numerical  optimization.  The  Doppler 
parameter  a  =  0.88,  and  the  value  of  the  training  period  is  hxed,  T  =  15. 


where 


and  where 


'{a  +  hfR  +  {a-  hfR  -  2  (4  -  lOu;^  +  3u;|)' 
.  3(2-a;,)2(l-u;,)  _ 


o  —  81(U£  —  468(U£  828co’£  —  640cu£  -f-  624(U£  —  192(U£  -|-  64, 

h  =  6^3(0;^  -  -  208a;|  +  168u;|  -  64u;^  +  16.  (21) 

The  function  g{uji)  depends  on  the  estimator  quality,  and  is  shown  in  hgure  21.  At  the  end 
points,  oji  =  {0, 1},  our  results  agree  with  existing  theory: 


1.  Observe  that  ^'(O)  =  0.  Therefore,  when  no  CSI  is  available,  it  is  always  better  to  use 
OOK  instead  of  BPSK.  This  is  in  agreement  with  the  results  of  [1],  which  are  for  the 
uJi  =  Q  case. 

2.  It  can  be  verihed  that 

lim  =  oo, 

which  conhrms  that,  when  full  CSI  is  available,  BPSK  is  always  optimal  independent 
of  the  faded  SNR.  This  is  in  agreement  with  the  well  known  fact  that,  for  AWGN 
channels,  the  use  of  OOK  is  suboptimal  to  BPSK  for  a  hxed  average  symbol  energy 
constraint. 
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Figure  21.  The  transitional  faded  SNR  K  =  f{uj).  For  larger  faded  SNR  K  OOK  is 
cutoff-rate  optimal,  and  for  smaller  faded  SNR,  BPSK  is  optimal. 

3.  As  expected  from  figure  21,  it  can  be  verified  that  g{uji)  is  a  decreasing  function  of  coi. 
For  a  hxed  K,£  the  transmitter  should  switch  from  BPSK  to  OOK  as  CSI  diminishes. 


The  design  rule  of  equation  21  gives  an  analytic  basis  for  a  hybrid  modulation  scheme  in 
which  the  transmitter  can  select  between  BPSK  and  OOK  based  on  the  faded  SNR  and 
estimator  quality  available  at  the  receiver.  In  fSSj,  the  authors  used  capacity  as  a  metric 
and  considered  a  similar  analysis,  where  the  transmitter  was  free  to  choose  the  optimal 
binary  distribution  for  each  sub-channel  (among  all  possible  binary  distributions).  Because 
of  the  intractability  of  the  capacity  metric  and  the  input  design  rule,  a  numerical  analysis 
was  given.  Here,  we  consider  a  scheme  that  alternates  between  OOK  and  BPSK  using  only 
an  analytic  design  rule.  The  cutoff  rate  for  the  BPSK/OOK  hybrid  modulation  scheme  is 


where 


Ro 


1 

f 

1 

f 


5^1og2 

c 

5^1og2 


1  +  tvi(l  ~ 

1  +  tti 


\/l  -f  2/^i(l  —  o;^)  1 

1  +  J 


C  —  {i  :  /^av  >  9{^e)}  =  {1)  •  •  • )  ^  ~  1} 


(22) 


(23) 


denotes  the  set  of  data  slots  where  BPSK  is  optimal,  and  where  denotes  the  set  of  data 
slots  where  OOK  is  optimal,  with  £  U  =  {1, . . .  ,T— 1}.  For  the  Q(i  o)  Q  (oo,0) 

estimators,  g{uj£)  is  a  decreasing  function  of  i.  Therefore,  sub-channels  are  initially 
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assigned  BPSK  modulation,  eventually  the  transitional  CSI-level  is  reached,  and  the 
remaining  sub-channels  are  assigned  OOK  modulation.  For  the  estimator,  g{u:i)  is 

largest  near  the  end  points.  The  leading  and  trailing  data  slots  are  assigned  BPSK,  and 
the  middle  slots  are  assigned  OOK,  as  illustrated  in  hgure  22. 
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Figure  22.  For  the  causal  estimators  (Q(i^o)  Q(oo,o)))  initial  data  slots  are  assigned 

BPSK,  and  the  latter  OOK.  For  the  Q(i4)  estimator,  the  beginning  and  trail¬ 
ing  data  slots  are  assigned  BPSK,  and  the  intermediary  slots  are  assigned 
OOK. 

We  compare  our  BPSK/OOK  adaptive  modulation  system  to: 

Cl.  The  BPSK-only  system  of  equation  12.  Denote  the  cutoff  rate  of  this  system  by 

-Ro,BPSK- 

C2.  The  OOK-only  system  which  uses  OOK  with  p  =  1/2  in  each  sub-channel.  Denote 
the  cutoff  rate  of  this  system  by  Ro, ook]  see  equation  11. 


To  simplify  the  presentation  we  will  consider  the  equal  energy  case  where  Kq  =  Hi  =  ttav 
We  start  by  considering  the  Q(i_o)  estimator  in  detail.  To  evaluate  equation  22,  we  first 
seek  to  determine  C.  Evaluating  the  threshold  function  yields, 

^  :  /^av  >  ^  =  9 

from  which  C  can  be  found  explicitly  for  fixed  values  of  /^av  and  a  (we  have  added  the 
superscript  (1,0)  for  clarity). 
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The  cutoff  rate  -Ro,j?  for  the  Q(i^o)  estimator  is 


T  -I  1  f 

Ro,h  - \  2  - 

r  V 


a 


21  ^av 


(1  +  ttav)^ 


1  +  2A€av  (  1  - 


1  +  ttav  (  1 


2  l  +  /^-a 


For  a  fair  comparison,  each  modulation  scheme  should  use  the  training  period  T  that 
optimizes  its  cutoff  rate.  Indeed,  for  the  OOK  based  systems,  it  is  possible  that  sending  no 
training  data  may  be  optimal  since  the  cutoff  rate  is  non-zero  when  =  0.  We  will 
consider  a  system  with  a  =  0.98,  and  use  the  results  of  table  2,  which  show  that  for  a 
BPSK-only  system,  the  optimal  training  period  at  large  tttot  is  T  =  7  (the  cutoff-rate 
saturates  to  0.71). 


In  hgure  23,  we  plot  the  cutoff  rate  of  the  BPSK  scheme  i?o, bpsk,  the  OOK  scheme  i?o, ook, 
and  the  BPSK/OOK  adaptive  modulation  scheme  For  small  ttav,  BPSK  is  optimal 
for  all  sub-channels  (from  equation  21),  and  so  the  cutoff-rate  Ro,h  is  equal  to  i?o, bpsk-  For 
the  intermediate  values  of  ttav,  BPSK  is  optimal  for  the  initial  sub-channels,  while  OOK  is 
optimal  for  the  latter  ones.  In  this  region,  Ro,h  is  larger  than  i?o, bpsk-  To  hnd  the  ttav 
above  which  Ro,h  becomes  larger  than  i?o, bpsk  we  solve  for  Atav  in  the  following  equation 

K..  =  9(4':r)=s(a^<^-‘>5-^).  (24) 

which,  with  a  =  0.98  and  T  =  7,  indicates  that  the  OOK/BPSK  hybrid  scheme 
outperforms  the  BPSK-only  scheme  starting  at  ttav  ~  9-8  dB.  This  is  conhrmed  in  the 
hgure. 


For  large  ttav,  Ro,h  is  equal  to  i?o, ook  since  OOK  outperforms  BPSK  in  all  sub-channels. 
To  hnd  the  ttav  above  which  the  OOK  only  scheme  performs  as  well  as  the  BPSK/OOK 
adaptive  scheme  we  solve  for  Atav  in 

=  9  {^'r)  =  9  .  (25) 

which  yields  the  intersection  point  as  Atav  ~  20.73  dB. 

At  any  value  of  the  faded  SNR  tCav,  the  BPSK/OOK  scheme  performs  at  least  as  well  as 
the  best  of  the  BPSK  or  OOK  only  approaches,  and  for  some  intermediate  range  of  ttav, 
the  adaptive  scheme  performs  better  than  the  best  of  either  the  OOK  only  or  BPSK  only 
schemes.  Note  again  that  substantial  gains  are  obtained  by  using  OOK  in  the  latter  slots. 
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Figure  23.  A  comparison  of  the  cutoff  rate  for  several  modulation  techniques  for  the  q) 
estimator  with  a  =  0.98  and  T  =  7.  Ro, bpsk  denotes  BPSK  only,  Ro, ook 
denotes  the  use  of  OOK  with  pi  =  1/2,  and  Ro,h  denotes  the  BPSK/OOK 
adaptive  modulation  scheme. 

Next,  we  repeat  the  preceding  analysis  for  the  Q(oo,o)  estimator.  The  hybrid  cutoff  rate 
Ro^h  is  again  given  by  equation  22  and  C  by  equation  23,  where  the  estimator  quality  is 
now  given  by  (see  equation  6).  To  hnd  the  two  intersection  points,  we  must  solve 

tvav  =  g  j  and  ^tav  =  9  j  in  a  fashion  analogous  to  equations  24  and  25.  This 

yields  the  intersection  points  ^tav  ~  10.4  dB  and  /^av  ~  20.8  dB  (because  the 

intersection  points  must  be  to  the  right  of  their  values  in  the  previous  case).  This  is 
conhrmed  in  hgure  24,  where  we  plot  Ro^h  for  the  Q(oo,o)  estimator  for  a  =  0.98  and  T  =  7 
(the  choice  T  is  again  derived  from  table  2). 

Next,  we  consider  the  Q(i4)  estimator  in  hgure  25.  The  hybrid  cutoff  rate  Rq^h  is  again 
given  by  equation  22  and  C  by  equation  23,  where  the  estimator  quality  is  now  given  by 
We  take  a  =  0.98  and  set  T  =  11  based  on  table  2.  For  this  non-causal  estimator, 
the  left  intersection  point  is  now  found  by  solving  for  /^av  in  the  equation  ^Cav  =  9 
this  yields  the  value  of  Atav  above  which  the  central  data  slot  will  be  assigned  OOK.  To  hnd 
the  right  intersection  point,  we  solve  Atav  =  9  this  yields  the  /^av  above  which  the 

last  (and  also  the  hrst)  data  slot  is  assigned  OOK. 
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Figure  24.  A  comparison  of  the  cutoff  rate  for  several  modulation  techniques  for  the  q) 
estimator  with  a  =  0.98  and  T  =  7.  Ro, bpsk  denotes  BPSK  only,  Ro, ook 
denotes  the  use  of  OOK  with  pi  =  1/2,  and  Ro,h  denotes  the  BPSK/OOK 
adaptive  modulation  scheme. 


Figure  25.  A  comparison  of  the  cutoff  rate  for  several  modulation  techniques  for  the  Q(ip) 
estimator  with  a  =  0.98  and  T  =  11.  Ro, bpsk  denotes  BPSK  only,  Ro, ook 
denotes  the  use  of  OOK  with  pf  =  1/2,  and  Ro,h  denotes  the  BPSK/OOK 
adaptive  modulation  scheme. 
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For  each  of  the  three  estimators  considered,  we  see  that  the  hybrid  scheme  captures  the 
optimality  of  BPSK  at  low  SNR  (resulting  in  a  energy  savings  of  up  to  2  ~  3  dB  versus  the 
pure  OOK  scheme),  and  the  optimality  of  OOK  at  high  SNR  (allowing  the  cutoff  rate  to 
saturate  to  its  maximum  possible  value  of  T/(T  —  1)  as  ttav  — t  oo).  In  addition,  there  is  a 
region,  located  between  the  two  intersection  points  as  discussed  above,  for  which  the 
hybrid  scheme  outperforms  the  best  of  the  pure  OOK  and  pure  BPSK  schemes  at  each 
value  of  ttav 


7.  Optimal  Training  for  the  Jakes  Model 


In  this  section,  we  will  consider  the  cutoff  rate  for  the  Jakes  channel  correlation  model  [18]. 
The  Jakes  model  is  considered  to  be  an  excellent  description  of  the  channel  correlation  of 
real-world  communication  channels.  However,  the  Jakes  model  is  often  not  amenable  to 
analysis,  leading  to  use  of  autoregressive  models  (e.g.,  the  Gauss  Markov  AR(1)  model)  in 
its  place.  In  this  section,  we  consider  the  effect  on  cutoff  rate  Ro  if  the  Jakes  model 
describes  the  channel  correlation.  We  also  assess  the  value  of  the  optimal  energy  and 
training  period  rules  designed  for  the  AR(1)  model  when  they  are  instead  applied  to  the 
Jakes  model.  In  this  section,  we  will  consider  BPSK  signaling. 


The  observation  under  the  Jakes  model  is  the  same  as  before,  from  equation  1, 

y'k  =  V^kKsk  +  n'k, 

where  /i(.  is  again  a  zero- mean  complex  Gaussian  random  process  with  variance  cr^,  but 
now  has  the  channel  correlation  Rj{t)  =  J f dT D\r\)  where  Jo(.)  is  the  zero-th  order 
Bessel  function  of  the  hrst  kind,  Jd  is  the  maximum  Doppler  frequency  and  Tp  is  the 
symbol  duration.  After  interleaving,  estimation,  and  de-interleaving,  the  observation 
equation  becomes  (rewriting  equation  8) 

Vk  =  \/Eik/T]  hkSk  +  sj E\kiT'\  hkSk  +  Uk, 


and  the  corresponding  cutoff  rate  for  the  Q(xy)  estimator  under  the  Jakes  model  is  given  by 


1  1 


T 


t=i 


1  +  Kfi 


where  denotes  the  estimator  quality  for  the  Qfx.y)  estimator,  under  the  Jakes  model. 

jqu. 


J  LiXl^  ^OtXXXXCXtV^X  V^LtCXXXtJ  XV^X  tXX^  (x  y) 

In  appendix  A,  the  estimator  quality  equations  are  found  to  be  (letting  Rhir)  =  Rj{t)) 


+  r(r-r))(^o  +  ^o)  +  2ttor(£)r(T-r)  Jo(27r/DTDT), 
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where 


P  ^  Jo  Ptt/dTd/c]  [Kq  +  1)  —  Jo  Ptt/dTdT]  Jo  [2ti f]:iTD{T  —  kj]  Kq 
“  (Ko  +  1)2  -  J2  [27r fnTnT]  ' 

Next,  we  will  test  our  design  paradigms  for  energy  allocation  and  the  training  period  that 
were  derived  from  the  AR(1)  model,  on  the  Jakes  model.  Do  our  designs  obtain 
near-optimality  in  the  Jakes  channel  correlation  model?  If  they  do,  then  the  results  are 
useful,  as  the  Jakes  model  is  taken  to  be  an  excellent  model  of  real-world  wireless  channels. 
Next,  we  will  compare  the  value  of  the  Jakes  cutoff  rate  with  the  AR(1)  cutoff  rate.  It  is 
desirable  that  they  be  in  close  agreement,  as  this  will  validate  the  insights  gained  from 
studying  the  cutoff  rate  curves  based  on  the  AR(1)  model  in  the  previous  analysis. 

In  comparing  the  Jakes  and  AR(1)  models,  an  important  issue  arises:  what  value  of  a 
(used  to  measure  Doppler  in  the  AR(1)  model)  corresponds  to  a  hxed  value  of  fuTo  (used 
to  measure  Doppler  in  the  Jakes  model)?  One  plausible  way  to  compare  models  is  to  use  a 
weighted  mean  square  error  distortion  metric  so  that 

M 

a{TDfD,M)  =  argminy'n^  l^j(^)  -  RA{^)f  (26) 

a  ' 

where  Ra{t)  =  Rj{t)  =  Joi^TTf^TDr),  M  is  the  number  of  lags  over  which  we  wish  to 
match  the  two  correlation  functions,  and  >  0  are  the  weights.  This  weighting  is 

certainly  necessary.  For  example,  for  causal  estimators,  earlier  “lags”  contribute  more  to 
the  cutoff  rate  than  later  lags.  However,  there  is  a  problem  with  this  approach.  Note  that 
a{Tr,fr,,M)  changes  as  the  number  of  lags  of  interest  (M)  changes.  More  importantly,  how 
do  we  determine  the  value  of  the  weights  vt  in  equation  26? 

Here,  we  will  not  attempt  to  design  a  universal  mapping  rule.  Instead,  we  will  require  that 
each  value  of  the  pair  (/dTd,T)  should  be  mapped  to  one  value  of  a,  i.e.,  for  all  ttav  We 
will  be  omniscient  in  how  this  value  of  a  is  chosen.  We  will  always  pick  the  “best-£t”  a,  as 
will  be  described  in  the  sequel. 

7.1  Energy  Allocation 

Here,  we  compare  the  optimal  energy  allocation  (ttg  j,  j)  when  the  Jakes  model  is  used 
to  that  when  the  AR(1)  model  is  used  before,  we  impose  a  total  energy 

constraint  Kq  +  {T  —  l)t€i  <  KavT  and  consider  the  value  of  the  training  period  to  be  hxed 
(the  optimal  training  period  is  considered  in  the  next  section). 

For  the  Q(i,o)  estimator,  it  is  easy  to  verify  that  the  optimal  energy  allocation  is  the  same 
for  both  the  Jakes  and  AR(1)  models.  This  is  because,  for  this  estimator,  the  optimal 
energy  allocation  does  not  depend  on  the  channel  correlation  function  at  all.  As  a  special 
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case,  we  note  that  this  result  also  holds  for  the  purely  bandlimited  fading  process 
considered  in  [31].  The  optimal  data  energy  is  given  by  (see  equation  13) 


!<;•*  _  !<;•*  _  _ 


r2  _ 


tttot 

T-  T 


r,  r 


ttavT  +  1 

T-2 


for  T  >  2.  For  T  =  2,  j  =  tttot/2.  For  this  simple  estimator,  the  AR(1)  model 

correctly  predicts  the  optimal  energy  allocation  for  the  Jakes  model. 


Next,  we  consider  the  Q(i4)  estimator.  In  hgure  26,  we  plot  the  cutoff  rate  for  both  the 
Jakes  model  and  for  the  AR(1)  model  for  two  different  values  of  the  Doppler  spread.  We 
choose  a  carrier  frequency  of  900  MHz  and  maximum  Doppler  spreads  fo  =  25  Hz  (with 
T  =  15)  and  fu  =  100  Hz  (with  T  =  5)  (this  corresponds  to  mobile  speeds  of  30  km/hr  and 
120  km/hr,  respectively  [32,  pp. 141-143])-  The  symbol  period  Tjj  is  1  msec.  At  each  value 
of  Ktot,  the  cutoff  rate  is  optimized  over  the  energy  allocation  {Kq,  Ki)  separately  for  each 
model.  For  the  AR(1)  model,  the  best-£t  a  is  found  by  hnding  the  single  value  of  a  that 
minimizes  the  average  difference  in  the  cutoff  rates  \Ro,j  —  Ro,a\  over  all  Ktot-  We  hud  that 
IdTd  =  0.1  corresponds  to  a  =  0.88  and  that  /dTd  =  0.025  corresponds  to  a  =  0.99.  From 
the  hgure,  we  see  that,  when  the  appropriate  value  of  a  is  chosen,  the  cutoff  rate  of  the 
Jakes  model  closely  matches  that  of  the  AR(1)  model  over  all  Ktot- 

We  note  that  the  best-£t  value  of  a  is  chosen  using  the  \Ro,j  —  Ro,a\  criteria,  and  thus,  the 
difference  in  the  associated  training  energies  \k,o,j  —  a\  be  large,  as  is  evident  from 
hgure  27.  Next,  we  ask  the  following  question:  If  we  allocate  training  energy  to  the  Jakes 
model  based  on  the  optimal  training  energy  for  the  AR(1)  model  (i.e.,  we  let  tto,j  =  IYqa}j 
what  is  the  ehect  upon  cutoh  rate?  In  hgure  28,  we  plot  the  cutoh  rate  for  the  Jakes  model 
when  Kqj  =  j  (denote  this  cutoh  rate  by  Rlj)  and  when  Kqj  =  (denote  this  cutoh 
rate  by  Ro,j).  The  system  parameters  are  the  same  as  in  the  previous  case.  We  see  that 
there  is  virtually  no  loss  in  the  cutoh  rate  when  the  energy  allocation  based  on  the  AR(1) 
model  is  used  to  dictate  the  energy  allocation  for  the  Jakes  model,  provided  that  an 
appropriate  value  of  a  is  chosen. 


7.2  Training  Period 


In  this  section  we  consider  the  optimal  training  period  T*  under  the  Jakes  model.  We  will 
assume  that  fuTo  ^  1,  so  that  we  are  interested  only  in  the  hrst  decreasing  “half-lobe”  of 
the  Jakes  function.  In  appendix  G,  it  is  shown  that  for  the  Q(i,o)  estimator,  a  lower  bound 
on  the  optimal  training  period  in  the  high  SNR  {Utot  — t  oo)  scenario,  under  the  Jakes 
model,  is  again  given  by 


T-l 

Tb, (1,0)  =  argmm  JJ 
e=i 


1  - 


Jl{2TlfBTBe) 


l/T 


(27) 
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Figure  26.  The  cutoff  rate  for  the  Jakes  model  i?o,j  and  AR(1)  model  Ro,a  for  two  sets  of 
parameters:  fo  =  25  Hz,  a  =  0.99,  T  =  15  and  fo  =  100  Hz,  a  =  0.88,  T  =  5. 
The  symbol  period  is  To  =  1  msec  and  the  carrier  frequency  fc  =  900  MHz. 


Figure  27.  The  training  energy  ratio  /^^g/z^tot  for  the  Jakes  model  and  AR(1)  model  for 
two  sets  of  parameters:  fo  =  25  Hz,  a  =  0.99,  T  =  15  and  /d  =  100  Hz,  a  = 
0.88,  T  =  5.  The  symbol  period  is  To  =  1  msec  and  the  carrier  frequency 
fc  =  900  MHz. 


Also  in  the  appendix,  it  is  shown  that  this  lower  bound  is  exact  at  high  SNR,  and  valid  for 
any  channel  correlation  function  that  decreases  monotonically  in  the  range  of  interest. 
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Figure  28.  The  cutoff  rate  for  the  Jakes  model  for  two  sets  of  parameters:  =  25  Hz,  a  = 

0.99,  T  =  15  and  fo  =  100  Hz,  a  =  0.88,  T  =  5.  i?*  j  denotes  the  cutoff  rate 
using  the  true  optimal  energy  allocation.  i?o,j  denotes  the  cutoff  rate  when 
the  optimal  energy  allocation  based  on  the  AR(1 )  model  is  used  instead. 


Even  without  the  assumption  JdTd  ^  1,  numerical  evidence  indicates  that  equation  27  is 
valid.  The  lower  bound  is  illustrated  in  table  3.  Again,  we  choose  =  1  msec  and  a 
carrier  frequency  of  900  MHz.  The  values  of  foTo  shown  in  the  table  are  0.1,  0.05,  0.025, 
and  0.01  and  correspond  to  mobile  speeds  of  120,  60,  30,  and  12  km/hr,  respectively.  If  we 
approximate  foTn  =  0.1  and  foTn  =  0.01  in  the  Jakes  model  as  being  equivalent  to 
a  =  0.80  and  a  =  0.99  in  the  AR(1)  model  (based  on  our  analysis  in  the  previous 
subsection),  we  see  that  both  the  AR(1)  model  and  Jakes  model  result  in  very  similar 
training  period  designs.  The  value  of  the  lower  bound  is  nearly  the  same  for  the  two 
models.  Additionally,  we  see  that  in  both  models,  the  bound  is  tight  even  at  low  to 
moderate  SNR,  with  this  tightness  increasing  as  the  Doppler  spread  increases.  Therefore, 
we  conclude  that  for  the  q)  estimator,  the  training  period  results  derived  from  the 
AR(1)  model  are  also  applicable  when  the  Jakes  model  describes  the  channel  correlation. 

Based  on  the  preceding  discussion,  we  conclude  that  an  analysis  of  the  AR(1)  model  results 
in  useful  insights  on  how  to  allocate  energy,  choose  the  training  period,  and  determine  a 
range  of  reliable  transmission  rates.  Furthermore,  these  results  are  useful  even  if  the  “true” 
wireless  channel  (i.e.,  the  channel  encountered  in  practice)  is  described  well  by  the  Jakes 
model. 
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Table  3.  Comparison  of  the  optimal  training  period  to  the  lower  bound  Te  (i  q) 
for  several  different  values  of  foTo  and  ttav  under  the  Jakes  model. 


rri^ 

^(1.0) 

Tb, (1,0) 

IdTd  —  0.1 

ttav  —  1 

3 

2 

ttav  =  10 

2 

2 

ttav  =  100 

2 

2 

fnTn  =  0.05 

t^av  =  1 

4 

3 

ttav  =  10 

4 

3 

ttav  =  100 

3 

3 

foTo  =  0.025 

ttav  —  1 

7 

5 

ttav  =  10 

5 

5 

ttav  =  100 

5 

5 

foTo  —  0.01 

t^av  =  1 

14 

9 

ttav  =  10 

10 

9 

ttav  =  100 

9 

9 

So  far  we  have  considered  two  particular  channel  correlation  models  (the  AR(1)  and  Jakes) 
and  three  MMSE  estimators  (Q(i_o))  Q(oo,o))  Q(i4)).  In  the  next  section,  we  generalize 
the  framework  to  an  arbitrary  channel  correlation  function  =  [K^k+r]  and  for 

an  entire  class  of  MMSE  estimators. 


8.  Generalized  Channel  and  Estimation  Model 


In  this  section,  we  will  generalize  the  channel  correlation  model  and  the  channel  estimation 
model  to  show  how  our  previous  results  follow  as  special  cases  of  a  more  general  system 
model.  For  the  sake  of  completeness,  we  restate  the  channel  observation,  state,  and 
estimation  expressions  in  general  terms.  We  will,  however,  consider  only  binary  input 
strategies  and  consider  the  case  where  all  data  transmissions  are  assigned  equal  energy. 

The  channel  observation  is  given  by 

Vk  =  \^kKsk  +  4, 

where  the  coded  input  Sk  is  selected  from  a  binary  signal  set  S  =  {A,  B}  (i.e.,  Sk  G  S)  and 
subject  to  a  unit  average-energy  constraint:  -f-  (1  —p)B‘^  =  1,  where  p  is  the  probability 
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of  transmitting  A.  The  seqnence  n'^  ~  CAf  (0,  describes  AWGN,  and  the  transmission 
energy  nsed  at  time  k  is  Eklskl"^.  The  Ganssian  channel  h'j.  ~  CAf  {0,al)  has  a  general 
correlation  fnnction  Rhir)  =  and  describes  time-correlated  Rayleigh  fading. 

Godewords  of  length  N'  =  N{T—1)^N  G  Z  are  nsed. 

Training  is  sent  periodically  once  every  T  transmissions,  i.e.,  dnring  the  time  slots  k  =  mT. 
At  each  time  mT  -1-  an  MMSE  estimate  of  the  channel  is  made  at  the  receiver 

nsing  some  snbset  Af  of  past  (and  possibly  fntnre)  training  symbol  observations,  so  that 

h'mT+e  =  ^WmT+e\{y'nT}  ^  U  &  Af  (Z  Z].  (28) 

The  nse  of  an  MMSE  estimator  implies  that  ~  CA/"(0,a|),  where  aj  is  the  estimator 

variance.  From  orthogonality,  that  CAf{0,  al  -  a}).  That  is,  and  are 

independent.  To  characterize  the  performance  of  a  particnlar  estimator,  we  will  dehne  the 
estimator  quality  coi  as 

a;,  4  aj/al  (29) 

Note  that  orthogonality  implies  that  0  <  <  1. 

Denoting  the  estimation  error  at  time  k  by  the  system  eqnation  can  be  written  as 

y'k  =  \/E\^^  K^k  +  n'^ 

=  yf -Ep/T]  h'^Sk  +  yf E\k/T'\  h'f.Sk  + 

where  we  have  assnmed  that  the  energy  allocation  for  all  data  transmissions  Ei  is  constant 
dne  to  practical  constraints  (e.g.,  peak-to-average-power  ratio  specihcations  and  transmitter 
complexity).  The  inpnt  Sk  is  now  selected  from  a  complex  signal  set  R[a:]}  and 

snbject  to  a  nnit  average-energy  constraint:  p[fc]A|,j  +  (1  —  p[k])B‘^^  =  l\/k  ^  mT.  In  the 
training  slots,  we  assnme,  withont  loss  of  estimator  performance,  that  Sq  =  {+!}• 

We  assnme  that  perfect  interleaving  is  performed  at  the  transmitter,  and  that  channel 
estimation  is  done  before  de-interleaving  at  the  receiver.  The  observation  eqnation  is  now 

yk  =  \/EkhkSk  +  "nk, 


where  hk  is  an  i.i.d.  seqnence  representing  the  interleaved  channel,  and  where 

Uk  ~  CAf  (0,  a%)  is  another  AWGN  seqnence.  Writing  this  in  terms  of  the  channel  estimate 

and  error 


yk  =  \/B^k/T]  hkSk  +  Hk 

=  hkSk  +  sj E^kiT\  hkSk  +  Uk-  (30) 

Interleaving  implies  that  hk  and  hk  are  independent  seqnences  in  k,  and  also  with  respect 
to  each  other.  However,  interleaving  does  not  change  the  marginal  statistics  of  the  channel 
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estimate  and  estimation  error:  hmT+i  CM  (0,  aj)  and  /imr+^  ~  CM  (0,  al  -  aj). 
Fnrthermore,  hmT+i  and  hmT+i  are  independent.  The  estimator  qnality  is  defined  by 
eqnation  29  as  before.  Finally,  we  assnme  that  codewords  are  decoded  nsing  the 
ML-detector  which  treats  SmT+e  as  the  channel  inpnt  and  the  pair  {i/mT+i,  hmT+i)  as  the 
channel  ontpnt. 


The  cntoff  rate  for  generalized  binary  transmission,  given  the  front-end  above,  is  given  by 
(see  appendix  D  with  =  Ki) 


T-l 


Ro  =  --j^^lOg; 


mm 


i=i 


1  +  2pt  (1  -  pi) 


PtA}  +  {l-pe)Bj=l 

\Jl  -|-  (1  —  Ul()  \AM  \/l  -|-  (1  —  Ul()  \BM 


1  +  I  (1  —  u)i)  +  \BM)  +  i  tti  1^4^  — 


where  tti  =  a^Ei/M^  was  previonsly  defined  as  the  faded  SNR  dnring  data  transmissions. 
Additionally,  we  define  Hq  =  alEo/a%  as  the  faded  training  energy. 


Next,  we  compnte  the  estimator  qnality  for  three  particnlar  estimators  (derivations  are 
given  in  appendix  A)  Consider  the  following  estimators  for  1  <  £  <  T— 1: 


Gl.  The  0)  estimator  for  which  M  =  {m}.  The  channel  estimate  and  the  estimator 
qnality  are  given  by 


K 


'mT-\-£ 


MM  rmvLt, 


Eoai  + 


4'’“’  =  RlV) 


1  +  tto 


G2.  The  Q(oo,o)  estimator  for  which  M  =  {m,m  — l,m  — 2, . . .}.  The  channel  estimate  and 
the  estimator  qnality  are  given  by 


h'mT+e  =  -7W  y'vT+ilv,i  and 

-j  OO  OO 

h{nT  i)Rh{mT  +  i)z{\n  —  m\) 

m=0 


and  ^  Rh{nT  +  tjz{\n  -  n|). 

n=0 


where  z{t)  =  Z~ 


^[Rh{'rT)]+-^— 
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G3.  The  for  which  M  =  {m,m  +  l}.  The  channel  estimate  and  the  estimator 

qnality  are  given  by 

h'mT+e  =  -Y  [r (£)  y'^r^  +  r {T_i)  |/(m+i)r]  > 

On 

=  (r(£)  +  r^'r-^))(^o  +  ^o)  +  ‘^l^l^{e)^{T-i)Rh{T), 


where 


^  Rh{k){K,Q  +  1)  —  Rh{T)Rh{T  —  k)Ko 

~  (K„  +  1)=  -  Kl  Rl(T) 


In  this  section,  we  have  shown  how  to  formnlate  the  cntoff  rate  for  a  general  channel 
correlation  matrix  Rh{T)  and  for  a  general  MMSE  estimator.  By  letting 

=  Ra{t)  =  we  are  able  to  derive  the  resnlts  for  the  AR(1)  model.  Similarly,  by 
letting  Rhir)  =  Rj{t)  =  Jo(27r/rfTD|r|)  we  are  able  to  derive  the  resnlts  for  the  Jakes 
model.  Additionally,  we  see  that  it  is  easy  to  consider  an  entire  class  of  MMSE  estimators 
for  which  onr  cntoff  rate  metric  holds,  each  dehned  by  the  pilot  set  A/",  or  eqnivalently,  it’s 
estimator  qnality 


9.  Discussion  and  Summary 


In  this  report  we  have  considered  the  optimal  allocation  of  resonrces  (training  energy  and 
training  freqnency)  for  correlated  fading  channels  when  partial  CSI  is  available  at  the 
receiver  throngh  periodic  training.  We  have  nsed  the  channel  cntoff  rate  as  onr 
optimization  metric.  Mainly,  we  have  assnmed  that  the  transmitter  has  perfect  knowledge 
of  the  channel  Doppler  spread,  bnt  have  also  treated  the  case  where  the  transmitter’s 
knowledge  is  incorrect. 

First,  we  reviewed  the  Ganss-Markov  correlated  fading  channel  and  discnssed  a  periodic 
training-based  channel  estimation  scheme  which  provides  partial  GSI  to  the  receiver  by 
taking  an  MMSE  estimate  of  the  channel  based  on  some  snbset  of  the  received  pilots. 
Three  different  MMSE  estimators  (i.e.,  three  different  pilot  snbsets)  were  considered,  and 
the  characteristic  estimator  quality  of  each  estimator  was  given. 

Next,  we  derived  the  cntoff  rate  Ro  for  onr  training-based  front  end.  Althongh  onr 
emphasis  is  on  the  Ganss-Markov  channel,  this  cntoff  rate  metric  (which  is  parameterized 
by  the  estimator  qnality,  SNR,  and  training  period)  holds  for  any  channel  correlation 
fnnction.  We  focnsed  on  binary  signaling  and  noted  that  OOK  is  optimal  when  no  GSI  is 
available,  whereas  BPSK  is  optimal  when  fnll  GSI  is  available.  To  stndy  the  intermediate 
region  of  partial  GSI,  we  derived  a  design  rnie  which  gives,  as  a  fnnction  of  the  GSI  and 
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SNR  available  at  the  receiver,  the  optimal  input  distribution  (OOK  or  BPSK).  We  have 
shown  that  adaptively  switching  from  BPSK  to  OOK  as  CSI  diminishes  (or  as  SNR 
increases)  results  in  dramatic  gains  in  the  cutoff  rate  versus  a  static  (pure  OOK  or  pure 
BPSK)  approach. 

Next,  we  conhned  our  interest  to  BPSK  and  determined  the  optimal  energy  allocation 
between  training  and  data  for  each  of  the  three  estimators  under  consideration.  Initially, 
all  data  slots  were  required  to  have  the  same  training  energy.  For  the  q)  Q  (oo,0) 
estimators,  exact  expressions  for  the  optimal  training  energy  were  given  (implicitly  for  the 
Q(oo,o)  estimator).  For  the  Q(i4)  estimator,  analytic  bounds  were  given  at  high  SNR,  and 
were  supplemented  by  numerical  simulations.  For  each  of  the  three  estimators,  we  gave  a 
lower  bound  on  the  optimal  training  period  that  is  exact  at  high  SNR,  and  gave  several 
relations  that  describe  the  relative  lengths  of  the  training  period  amongst  the  three 
estimators.  We  have  shown  that  optimizing  training  over  the  energy  allocation  and 
training  period  is  worthwhile;  gains  of  up  to  4  dB  are  to  be  had  versus  an  un-optimized 
(but  quite  reasonable)  training  approach. 

We  also  considered  energy  allocation  when  each  data  slot  may  have  a  different  amount  of 
energy  allocated  to  it,  resulting  in  a  T-dimensional  energy  allocation  problem.  We  gave  an 
analytic  solution  for  the  optimal  energy  vector  which  is  exact  in  several  limiting  senses.  We 
have  shown  that,  if  a  reasonable  (close  to  optimal)  value  of  T  can  be  chosen  at  the 
transmitter,  then  there  is  little  gain  in  a  T-dimensional  energy  allocation  in  place  of  the 
simpler  2—D  energy  allocation.  However,  if  a  “loose”  value  of  T  is  chosen,  the 
T-dimensional  energy  allocation  results  in  signihcant  gain  to  the  cutoff  rate.  Finally,  we 
tested  the  validity  of  our  design  paradigms  for  energy  allocation,  training  period,  and 
cutoff  rate  for  the  AR(1)  model,  by  using  these  same  design  rules  when  the  channel 
correlation  is  described  by  the  popular  Jakes  correlation  model.  We  hnd  that  the  cutoff 
rate  predicted  by  the  AR(1)  model  is  extremely  close  the  corresponding  cutoff  rate  for  the 
Jakes  model  when  an  equivalent  value  of  the  Doppler  spread  is  used  in  each  model. 
Furthermore,  we  hnd  that  the  energy  allocation  dictated  by  the  AR(1)  model  can  be  used 
in  the  Jakes  model  with  little  loss  in  the  cutoff  rate.  Although  we  have  focused  on  the 
AR(1)  and  Jakes  models,  our  results  (see  section  8)  hold  for  general  channel  correlation 
models,  included  the  perfectly  bandlimited  fading  channel. 

There  are  many  avenues  of  future  work:  Using  average  capacity  as  a  metric,  and  assuming 
a  perfectly  bandlimited  Doppler  spectrum,  optimal  training  frequency  and  energy 
allocation  were  considered  in  [31].  Extensions  of  our  results  to  (possibly  redundantly) 
precoded  transmissions  are  of  interest.  Given  the  wi de-sense  stationarity  of  the  channel, 
periodic  placement  of  pilots  is  reasonable,  but  what  is  the  optimal  periodic  pilot  scheme? 

In  the  context  of  minimizing  the  maximum  mean  square  estimation  error,  single  pilot 
placement  has  been  shown  to  be  optimal  for  the  Gauss-Markov  model  [10].  However,  it  is 
not  clear  that  the  same  result  holds  for  the  cutoff  rate,  for  it  is  precisely  the  data  slot  with 
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the  largest  mean  square  error  that  contributes  least  to  the  cutoff  rate.  This  motivates  the 
study  of  other  periodic  placement  schemes.  Next,  it  is  expected  that  for  fast- fading 
channels,  superimposed  training  will  outperform  a  periodic  placement  approach  (a  MMSE 
and  BER  analysis  was  given  in  [11]  for  the  AR(1)  model).  A  characterization  of  the  region 
in  which  a  form  of  superimposed  training  outperforms  periodic  placement  is  an  interesting 
topic  of  further  research.  Also,  the  results  in  this  report  have  been  primarily  for  binary 
signaling  (motivated  by  the  popularity  of  binary  signaling  techniques,  particularly  at  low 
SNR  and  for  low  complexity  systems).  Still,  it  is  of  interest  to  determine  how  the  cutoff 
rate  is  affected  by  higher  order  constellations.  Indeed,  the  cutoff  rate  expressions  provided 
in  this  paper  can  easily  be  extended  to  arbitrary  signal  constellations,  albeit,  a  closed  form 
expression  rarely  exists  for  constellations  of  order  larger  than  two. 

Other  extensions  of  interest  include:  a  generalization  of  the  cutoff  rate  analysis  for  the 
MIMO  case  when  the  receiver  has  partial  CSI,  an  analysis  of  the  merits  of  PSAM  when 
on-off  keying  is  used  (i.e.,  is  PSAM-type  training  benehcial  in  this  scenario?  A  partial 
answer  was  given  in  [26]),  a  Doppler  analysis  when  the  transmitter  has  a  statistical 
estimate  of  a  (rather  than  a  deterministic  estimate,  as  given  in  section  4.4). 
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A.  Estimator  Quality  Equations  (Sections  2.2,  7,  and  8) 


Derivation  of  equation  5,  The  Q(i  o)  estimator 


The  pilot  observation  and  channel  state  are  jointly  Ganssian  with  correlation 

^WmT+k  {y'mT)*]  —  .  The  Conditional  expectation  of  one  Ganssian  vector  given 

another  is  evalnated  in,  e.g.,  [20].  Evalnating  =  S  obtain: 


K 


-RhWr, 


mT") 


(1,0) 


Rli^) 


Ko 

1  +  tto 


Derivation  of  equation  6,  The  Q(oo,o)  estimator 


Let  y  =  [yQ,y'_T,  ■  ■  ■  ,y'-(M-i)T\  ^  pilot  observations.  Then,  from  [20,  pp. 

508-509],  the  Q(m,o)  estimator  is  described  by 


h  rriT+i 


^  [h^T+ely]  =  and 

fj2  ^^'ruT+ty^yy 


where 


(31) 

(32) 


iCyy)ij  =  {£  [yy^])ij  =  EQalRh{\i  -  j\T)  +  a%5{\i  -  jj),  and 
^  [Cr+^y]  =  ^/KcTlRh{jT  +  l). 

Next,  dehne  the  fnnction 


Z[T 


^  z- 


1 

^  [RhiTr)] 


As  M  — )■  cx),  spectral  factorization  can  be  nsed  to  show  that 


1 


2(1“  -il)' 


(33) 


(34) 


nnder  mild  conditions  on  i?/j(Tr).  Snbstitnting  eqnation  34  into  31  and  32  we  End  that  the 
estimate  and  estimator  qnality  for  the  Q(oo,o)  estimator  are  given  by 


h'^T+e  =  -7^'^y'vT+e^v,e  and 

oo  oo 

h{nT  +  €)Rh{mT  +  €}z{\n  —  m\) 

n=0  m=0 


(35) 

(36) 
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where  7^,^  =  ^  Rh{nT  +  t}z{\n  -  u|). 

n=0 

Derivation  of  equation  7,  The  Q(i  1)  estimator 

Observe  that 


r  h'  1 

"'mT+e 

( 

Rh{i)VK(7l  Rh{T-i)y/E:an 

y'mT 

~  CA/" 

0, 

Rh{i)VE~oal 

alEo  +  al 

Rh{T)Eoal 

y(m+l)T_ 

I 

_Rf,{T-i)^,al 

Rh{T)Eoal 
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Evaluating  =  £ 


K 


\y'mT,y[ 


we  obtain: 


mT-\-£ 

(1,1) 


TX  [r (£)  y'^j,  +  r (T_i)  |/(m+i)r]  , 

On 


“  (^(r)  +  r(T-^))(tto  +  t^o)  +  {T-£)Rh{T), 


where 

(K„ + 1)^  -  Rim 
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B.  The  Infinite  Past  Estimator  for  the  AR(1)  Model  (Section 

2.2) 


In  Appendix  A.  we  fonnd  the  estimator  qnality  for  the  Q(oo,o)  estimator,  for  a  general 
channel  correlation  Rh{T).  Alternatively,  the  variance  has  been  derived  in  [20,  pp.443] 
nsing  standard  Kalman  hlter  theory.  The  estimator  qnality  is  given  implicitly  to  be 

(nsing  the  current  notation) 


OJ. 


(oo,0) 


Eq 


2T,  ,(oo,0) 
Ct  UJ« 


+  (l  - 


Solving  for  the  estimator  variance  yields 


(37) 


,  _  2( 
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Intentionally  Left  Blank. 
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C.  Interleaving  (Section  2.2) 


The  system  operates  on  codewords  of  length  N'  =  N(T—1).  Withont  loss  of  generality, 
consider  the  codeword  that  starts  at  time  k  =  0  denoted  by 


1  * 

=  Si  .  .  .  St-1,  St+1,  .  .  .  ,  S2T-1,  •  •  •  ,  S(Ar_i)r+l,  •  •  •  ,  SnT-1 


Similarly,  let  h'  =  h[. . .  . . . ,  *  denote  the  channel. 


h'  h'  h'  h' 

7)'  7)'  7j'  7)' 

Ib^.  .  .  Ibj^_^^  '''T+n  •  •  •  5  "'NT-l 


denote  the  channel  estimate,  and 


denote  the  channel  estimation  error  dnring  the 

transmission  of  a  codeword.  It  follows  that  S  =  ^  (T  =  Rh{\i  —  j\),  where  Rh{-) 

is  the  normalized  channel  correlation  fnnction.  Similarly,  we  dehne  normalized  correlation 
matrices  for  the  channel  estimate  and  estimation  error, 

2  =  4^ 


hh^ 

and  S  =  — 

hh^ 

- 

- 

The  ontpnt  is  y'  =  '/Eh!  ©  v  +  /Eh!  0  v  +  n,  where  E  =  diag  {E(]'^~^  ©  /tv  is  the  energy 
matrix,  and  where  the  noise  vector  n  =  [ni . . .  ht-i,  ut+i,  •  •  • ,  unt-/-  The  matrices  S  and 
S  depend  on  the  particnlar  estimation  scheme,  bnt,  in  general,  the  diagonal  elements  are 
given  byS0/TV'  =  /Ar<8  diag  {a;^}  and  E  Q  Ij^/  =  Ij^  diag{l  — 


The  cntoff  rate  for  generalized  binary  signaling  without  interleaving  can  be  fonnd  in  a 
manner  similar  to  that  described  in  section  3.  Considering  a  snper-symbol  of  length  N',  the 
cntoff  rate  is  given  by 


Ro  =  lim  max 

W-5>oo  Q(.) 


Qi/yPiy^'^ 


ve  x...x5|C^ 


dh  dy . 


Evalnating,  the  non-interleaved  cntoff  rate  is 


Ro 


lim  min  — — 

N^oo  Q{.)  NT 


^0^2  QMQM 


v,we  x...x5|f_ 


In'  +  KVEV^ 

1/2 

In'  +  KWEW^ 

1/2 

In'  +  I  /C  I 

[VEVH  +  WEWh'^ 

1  +  i  K{y  -w)E{y  -w)H 

(38) 


where  V  =  diag{v},  W  =  diag{w},  and  where  /C  =  diag{tt^}^^^  0  /tv  is  the  faded  energy 
matrix. 
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In  practice,  P-depth  (finite)  interleavers  are  often  nsed  to  reduce  the  correlation  within  a 
transmitted  codeword.  The  P-depth  interleaver  collects  a  codeword  for  transmission;  it 
then  transmits  consecutive  letters  of  the  codeword  once  every  P—1  symbols  slots.  To  avoid 
a  loss  in  overall  transmission  rate,  P  codewords  are  multiplexed  in  this  fashion.  For 
channels  which  can  be  accurately  modelled  as  wide  sense  stationary  with  a  monotonically 
decreasing  correlation  function  (e.g.,  a  channel  described  by  a  hrst  order  AR(1)  process, 
P/i(r)  =  Ra{t)  =  a} — l<a<l),  the  correlation  of  the  channel  within  a  codeword 
decreases  as  the  depth  of  the  interleaver  P  increases.  For  channels  whose  correlation 
function  can  only  be  bounded  by  an  exponentially  decreasing  envelope,  i.e., 

\Rh{T)\  <  exp“^l  "^1,  >  0,  correlation  is  not  reduced  for  each  increase  in  P,  but,  as  a 

general  trend,  it  does  decrease  with  increasing  P.  If  the  value  of  P  is  chosen  large  enough 
for  this  class  of  channels,  the  correlation  within  a  codeword  can  be  made  arbitrarily  small, 
presumably  leading  to  better  system  performance  (at  the  expense  of  added  complexity,  and 
delay)  when  using  codes  designed  for  independently  fading  channels. 

A  common  assumption  used  in  the  analysis  of  correlated  fading  channels  is  that  of  a 
“perfect”  (or  infinite  depth)  interleaver  at  the  transmitter  and  a  corresponding 
deinterleaver  at  the  receiver.  The  inhnite  depth  interleaver  simplihes  the  analysis  of 
correlated  fading  channels,  and  is  often  an  implicit  assumption  when  i.i.d.  channel  models 
are  used  in  analysis.  The  inhnite  depth  interleaver  effectively  removes  the  correlation  of  the 
channel  within  a  codeword  transmission.  The  assumption  of  inhnite  interleaving  is 
equivalent  to  setting  the  non-diagonal  entries  of  the  correlation  matrices  S  and  S  to  0,  so 
that 


S  ■(—  S  ©  In'  =  In  ®  diag  ,  and 

S  i —  S  0  In'  =  In  ®  diag{l — • 


Also,  given  the  perfect  interleaving  assumption,  h  and  h  are  mutually  independent,  i.e., 
T  |h  h  I  =  Oat',  so  that  the  ML-decoder,  generally  given  by 


V  =  max  P{y' ,  h'  |  v) 

V 


reduces  to  a  product-wise  detector 

N' 

V  =  max  JJP(|/fe,/ifc  I  Sfc). 

k=l 


where  the  sequence  ^hkj  is  an  independent  sequence  having  the  same  marginal  statistics 
as  as  described  the  post-estimation  post-interleaving  system  equation  30. 
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Note  that  under  perfect  interleaving, 


<5(v)  = 


T-l 


£=l 


N{T-1) 


and  that  equation  37  reduces  to  the  interleaved  cutoff  rate  given  by  equation  10. 
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Intentionally  Left  Blank. 
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D.  Cutoff  Rate  Derivation  (Section  3) 


Here,  we  evaluate  equation  9,  and  show  that  it  yields  equation  10.  Expanding  equation  9 


Ro  =  —  mm 


Q{.)  T 


s,vg  cSix...XcSt-i 


Q(s)Q(v)£’j; 


^(y|s,h)v^(y|’^’h)  dy 


L-'y 


(39) 


Note  that  h  ~  CA/'(0,  S),  where  S  =  diag{a;^}^^^.  Next,  we  make  the  following  dehnitions: 
E  =  diag{EjJ:/,  u,  =  y/ESh,  =  /EHh,  S,  =  alESES^  +  </r',  and 
'Ey  =  a\EVEV^  +  where  S  =  diag{s},  V  =  diag{v},  and  E  =  diag{l  — 

Note  that 

y|s,h  ~  CJ\f(u„  E,) , 
y|v,h  ~  CJ\f(Uy,  Ey) , 

and  that 


P(y|s,h)P(y|v,h) 

^  exp  {-(y  -  u,)^E;^(y  -  u,)  -  (y  -  u^)^S~^(y  -  u^)} 

Tl^^'\EsEy\ 

_  exp  {-y^(S;^  +  S-I)y  +  2Re  (y^(S7%*  +  S-%^)  -  -  u^S-^u„} 

“  7r2'^'|S,S,| 


Evaluating  the  integral,  we  obtain 


=  £. 


P(y|s,h)P(y|v,h) 


z^(Ss  ^Us+S„  ^u„)} 


Ss  ^Us+U^S„  ^u„) 


-1 


,1/2  ’ 


where  z  CM  (^0,  ^  2 

determinants  above,  we  get 


.  Evaluating  the  expectation  and  simplifying  the  ratio  of 


P(y|s,h)P(y|v,  h) 


1 I  — 1,,  I  — 1 


=  e 
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S,  u,+S,  u«j^_l(ufS“V,+ufS,7V„] 


|l/2 

|l/2 

Sg+Si, 

2 

=  0  2^ 


|l/2 

|l/2 
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(40) 
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Next,  we  take  the  expectation  of  equation  39  with  respect  to  h.  We  get  (substituting  back 
for  Us  and  u„): 


=  £’c 


,-i(/E(s'-y)h)"(Ss+s„)-i(v^(s-n)h) 
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|l/2 

Ss+St) 

2 

It'  +  Ve{s  -  -  v)Ve 


I  Sg-|~St; 
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-  1^)V^  S  -  1^)^ 

Substituting  for  S*  and  S^,  we  get 
&  I  f  v'F(y|S,h)P(y|l/,h) 


+  ctIVeses^Ve 

1/2 

+  (tIVEVEV^^/E 

1/2 

It'CtI  +  1  a2 Ve 

1  Ve  +  1  ^2  V:f(A  -  l^)S(A  -  V)^^/E 

All  matrices  above  are  diagonal.  Simplifying  equation  40  we  obtain 

T' 


TT  \/l  +  —  U()\si\^  \/l  +  —  U)l)\vi\^ 

t=i  1  +  ^(1  “  ^t){\si\^  +  \vi\^)  +  ^u)t\si  -  Vl\ 

Substituting  this  result  into  equation  38,  we  obtain 

T-l 

:)2 


1 


T  ^  Q,{.) 


E  Ai+  VI  +  «((i  - 


Finally,  evaluation  of  the  double  sum  in  equation  41  yields  equation  10. 


(41) 


(42) 
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E.  OOK  Optimality  with  No  CSI  (Section  3.1) 


It  is  sufficient  to  consider  the  term  in  equation  10.  Setting  =  0  and  then  dropping 
the  subscript  for  brevity,  we  seek  to  minimize 


p{l-p) 


yi  +  KlAiyi  +  KiBp 
1+  S(|AP  +  |S|2) 


1 


(43) 


subject  to  the  constraint  set  J  =  +  (1  —  p)\B\‘^  =  1,0  <  p  <  1,  \  A\‘^,  >  0}.  For 

any  hxed  value  of  p,  we  will  show  that  the  minimizer  of  equation  42  is  a  form  of  OOK. 
Without  loss  of  generality  assume  that  >  1  and  <  1,  so  that  +  A^, 

where  >  0  is  a  real  number.  Fixing  p,  we  restate  the  minimization  problem 


min 

J' 


^1  +  k{\B\^  +  Ab)^/1  +  k\B\^ 
1  +  f  (2|S|2  +  Ab) 


where  J'  =  {\B\‘^  +  p  Ab  =  1,  \B\‘^  >  0,  A^  >  0}.  Note  that 


df 

d\B\^ 

df 

BAb 


At  [2  -h  2ABAt  +  A|At^ 

\B\ 

pAt(2-h  AsAt)] 

+ 

\B\ 

PAtv^rrpi 

?  +  Ab) 

At(2-F 

\B\ 

pAt  -f  AsAt)^ 

m 

P  +  Ab)  \/l  +  At  B 

2 

(2  + 

\B\ 

PAt  + AsAt)^  V^l  +  (|B| 

2  +  Ab)  At 

>0, 


(44) 

(45) 


Equation  43  implies  that,  for  any  hxed  A^,  \B\^  should  be  made  as  small  as  possible.  The 
smallest  possible  value  \B\^  can  take  in  J'  is  0.  Equation  44  implies  that,  for  any  hxed 
|ilp,  Ab  should  be  made  as  large  as  possible.  The  largest  possible  value  A^  can  take  in  J' 
is  -.  Both  of  these  objectives  can  be  satished  simultaneously,  and  therefore,  for  any  fixed  p, 
I  Bp  =  0  and  A^  =  \A\^  =  ^  minimizes  equation  42.  Therefore,  when  =  0,  an  ON-OFF 
keying  solution  (B  =  0)  is  always  optimal.  To  hnd  the  optimal  value  of  p  (and  hence,  the 
optimal  value  of  ^4),  we  substitute  |Bp  =  0  and  \A\^  =  -  into  equation  42,  the  optimal 
value  of  p  is  given  implicitly  by 


p*  =  arg  min  p  (1  —  p) 

0<p<l 


K- 


Al 

2  p 


-  1 


as  At  — )■  cx),  p*  — )■  4,  and  as  At  — )■  0,  p*  — )■  0. 
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Intentionally  Left  Blank. 
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F.  Derivation  of  the  Optimal  Training  Energy  (Section  5.1) 


Optimal  Energy  for  the  Q(i  o)  Estimator 

For  T  =  2,  the  proof  of  the  theorem  follows  easily.  We  prove  the  theorem  for  T  >  2. 
Substituting  the  energy  constraint  into  equation  12,  we  wish  to  maximize 


Ro  = 


1 


i=i 


1  ^2^ _ l<{T  -  1)  -  _ 

2  —n\{T  —  1)  +  —  T"  +  2)  +  Ktot  +  1 


(46) 


over  the  region  0  <  tto  <  tttot- 


The  term  (1  <  p  <  T— 1)  in  the  summation  above  has  a  maximizer  that  is 
independent  of  p.  Therefore,  since  each  term  in  equation  45  is  maximized  by  Kip,  the 
overall  maximizer  of  equation  45,  Kl,  is  given  simply  by  Kl  =  Ki  p.  Finding  Kl  =  Ki  p  is 
equivalent  to 

*  .  Kj{T  -  1)  -  KiKtot 

K,  =  argmm - — - ; - — — — - 

Kl  —Ki(T  —  1)  +  Ki^Ktot  —  T  +  2)  +  Ktot  +  1 

It  follows  readily  that^ 


kI  =  T-  WF2  - 


tttot 

T-  1' 


F,  F 


KavT  +  1 

T-2 


(47) 


Finally,  we  must  verify  that  Kl  is  in  the  valid  range  [O,  •  Note  that,  since  F  > 

equation  46  always  yields  a  real  number.  Verifying  >  0  is  equivalent  to  verifying  that 
the  hrst  term  in  equation  46  is  larger  in  magnitude  than  the  second  term,  which  follows 
easily.  To  verify  that  Kl  <  we  must  verify  that 

Ktot  p  ^  Ktot 

T-1  - T-V 

or  (expanding  and  cancelling),  that  <  F,  which  is  true. 


^The  critical  points  of 


are  (0104  ^  0203) 


aix'^  +  a2X 
a^x"^  +  a^x  +  05 


-aios  3  a/ a\a\  -  (0404  -  0203)0305 

0104  —  0203 
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Optimal  Energy  for  the  Q(oo,o)  Estimator 


We  consider  T  >  2.  Substituting  the  energy  constraint  into  equation  12,  we  seek  to 
maximize 

T-l  r  2i 

1 


Ro  = 


1 


e=i 


l^tot  —  tto 


2  Ktot  ~  tto  +  (7^  ~  1) 

1 


1(1  +  Kq)  +  \J\{1  +  KqY  +  ^0  1^ 


over  the  region  0  <  tto  <  tttot-  The  maximizer  of  the  term  is  given  by 


— Ato 


max  I  _  , 

0<Ko<Ktot  [Ktot  -  Ko  +  {T  -1) 


which  is  independent  of  p.  Therefore, 


|(1  +  Kq)  + 


Kq  =  max 


o<k,o<htot  [tttot  ~  tto  +  {T  ~  1)  J 


1  - 


|(1  +  tto)  +  y  T 


(48) 


At  low  SNR,  we  simplify  the  LHS  and  RHS  above  by  noting  that  Kq  <  tttot  — t  0,  and 
rewrite  the  maximization  problem  as 


Kq  =  max 


tttot  — Ato 


0<Kjo<Kjtot  L(2^-i)J  I 


tto  (  1  + 


a 


2T 


1  — 


(49) 


where  we  have  used  the  fact  that  -\/l  +  x  ~  1  +  a:/2  for  small  x.  The  maximizer  is  seen  to 
occur  for  =  tttot/2.  At  high  SNR,  we  hrst  assume  that  as  tttot  — t  oo,  the  ratio  Kl/Htot 
remains  hnite,  so  that  ttg  — )■  cx)  as  well.  Under  this  assumption,  equation  47  becomes 


Kq  =  max 


l^tot—Ko 


0<tto<t€tot  tCtot  ~  tto  +  [T  —  1)  _  1  +  AtoJ 


1  - 


from  which  we  hud  that 


^0  = 


T-  2 


Since  tttot  — t  oo,  the  expression  above  can  be  simplihed,  =  Ktot 


Um-i 


.  Next,  we 


assume  that  as  tttot  — t  oo,  the  ratio  Kl/Htot  — )■  0.  In  this  case,  equation  47  becomes 


ttn  =  max 

0<t€o<t€tot 


1  - 


|(1  +  Kq)  + 


(50) 
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which  produces  the  zero-rate  result  Kq  =  Ktot-  Therefore,  the  hrst  high  SNR  assumption  is 
the  correct  one,  i.e.,  =  Ktot  as  Ktot  — t  oo. 


Bounds  on  Optimal  Energy  for  the  Q(i  i)  Estimator 


For  the  (1, 1)  estimator,  the  cutoff  rate  is  given  by 


1  r 

Ro  =  1- 


■(1.1) 

^  1  +  Ki 


where,  rewriting  equation  7,  with  ~  2a^^, 

B(^  =  and  C  =  1  — 

When  a  1,  the  estimator  quality  simplihes  to  [max  (a^^,  j  for 

i  =  T/2  or  =  2  o?^  for  T  =  £,  which  implies  that  only  the  “closest”  pilot  aids  in 

estimation.  The  optimization  of  equation  50  is  equivalent  to  the  maximization  problem  for 
the  Q(i,o)  estimator,  and  yields  Kq 

When  0^1,  the  estimator  quality  simplihes  to  =  2^+1  ■ 

*  tto  tttot  —  tto 

o<K,o<Ktot  Ko  +  1/2  tttot  -  tto  +  (T- 1) 


which  yields 


^0  a  i)  =  -l^+  +  where  p  = 


l^tot  +  (T  — 1) 


r''  '  \  r''  '  - - -  r*-  2^^  _  g  ^ 

Next,  we  consider  the  low  and  high  SNR  cases.  Returning  to  equation  50,  we  denote  the 
maximizer  of  the  term  in  the  expression  above  by  ttg  and  note  that 


o<Ko<K,tot  [tCtot  ~  tto  +  T— ij  [ttgC*  -f-  2Ko  1 
where  we  have  used  the  total  energy  constraint  in  the  left  term  above. 


tttot  —  tto 


+  tto-Bp 


In  general,  explicit  knowledge  of  K^p  is  not  enough  to  determine  However,  at  low  SNR 
K*Q  p  =  t€tot/2  which  is  independent  of  p,  and  so  =  K^  p  =  t€tot/2.  To  see  this  we  rewrite 

equation  52  under  the  condition  that  Kq  <  t^tot  0,  and  obtain 


*  tttot  —  tto 

K,n„  =  max  — — - 

0<K,0<K,tot  T-1 


HoBp- 


From  equation  53,  it  is  easy  to  verify  that  =  K^  p  =  Ktot/‘2. 

At  High  SNR,  we  return  to  equation  52,  and  multiply  out  terms  to  obtain 

*  ttgAp  KiqI^Kiiq^Ap  Bp^  t€o(tttot-Hp) 

o<K,o<K,tot  K,qC  —  Kq  [C(K,tot  S-  Z)  —  2\  —  tiQ  [2  (tttot  R  Z)  —  1]  —  (Ktot  —  Z) 
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where  Z  =  T—1.  Next,  we  note  that  the  function 

aix^  +  a2X^  +  a^x 

3\^)  = - 5 - ^ - 

a^x-^  +  a^x-^  +  ttQX  +  07 

has  critical  points  given  implicitly  by 


(0105  -  02*^4)^  {^CblCbQ  -  20304)3^^  -|-  (30407 


Applying  this  fact  to  equation  54,  we  determine  the  relevant  quartic  for  which  at  least  one 
root  equals 

^0%  +  +  64  =  0 

where 


_  (2Ap  —  BpC)  Ktot  +  Ap[2Z  —  1) 

^  “  BpC  +  Ap{CZ  -  2)  ’ 

_  Ktot  [~2Ap  +  BpC]  +  tCtot  [~2Ap(— 2  -\-  Z)  ~\~  BpCZ]  +  3ApZ  +  Bp{2Z  —  1) 
^  “  BpC  +  Ap{CZ  -  2) 

1  _  r,  ~^p^tot  +  tttot(-Bp  ~  ApZ)  +  -BpZ^ 

SpC  +  Ap{CZ  -  2)  ’ 

,  _  ~-Bptt(-Q4  —  BpKtotZ 

^  “  SpC  +  Ap(C'Z  -  2)  ■ 


Next,  we  invoke  the  high  SNR  assumption.  The  coefficients  can  be  simplihed: 

Bi  ^2 


5i  =  Idz: 


■‘2BpC 
r  ^tot  7 


62  =  1^'^ 


tot,  h  =  -^^l^tot,  and  64  =  where 


r  =  BpC  +  Ap(CZ  —  2).  We  continue  to  evaluate  the  quartic  in  this  fashion  (e.g.,  see  [33, 
pp.llj).  Retaining  the  relevant  (positive)  root,  we  eventually  hud  that 


^0,p  ~  ^tot 


7(p)  +  v'7(p)2-7(p) 


(56) 


where 

_  —2Ap  +  BpC 
~  BpC  +  Ap{CZ  -  2) 

^  -{T  -  2)a^P  -  (T  -  2)a2T  +  +  2(T  -  3)a2(p+n)  +  Ta^{2p+T)  _  2(t  -  I)a2(p+2T)  ’ 

where  the  last  expression  is  in  terms  of  fundamental  quantities  only.  It  can  be  verihed  that 
the  root  given  in  equation  55  (i.e.,  the  maximizer  of  the  p^'^  term)  is  the  only  root  within 
the  range  0  <  (.)  <  tttot,  and  that  ttQp  is  decreasing  for  p  =  1, . . . ,  and  increasing  for 

p  =  1"^^]  +  1, . . . ,  T—1.  This  implies  that,  the  overall  maximizer,  Hq,  satishes 


7 


< 


tttot 


< 


7(1)  +  \/7(l)^  -7(1)1  •  (57) 
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When  a  <<  1,  we  find  that  7(p)  =  —7^-  Therefore,  when  Ktot  — t  00  and  a  <<  1, 
eqnation  56  implies  that  [y/T—1  —  l] ,  since  both  sides  of  the  bonnd  are  eqnal. 

When  a  ^  1,  two  application’s  of  L’Hopital’s  rnle  shows  that  7(p)  =  2p^-2pT^T-2)T'^  •  Using 
this  fact  in  eqnation  56  reveals  that,  at  high  SNR  when  a  ^  1,  a  bonnd  on  Kq  is  given  by 
eqnation  56  where 

T2  -  2T  +  2 
T3  +  2T2  -  2T  +  2  ’ 

+  mod  (T,  2) 

2T3  +  3T2  +  mod  (T,  2) ' 

The  expressions  for  K.^  at  large  SNR,  for  small  and  large  a,  can  be  evalnated  for  large 
T  to  show  that  =  y/Tna.v  in  each  case.  Since  K,^  is  a  continnons  fnnction  of  a, 

we  conclnde  that  at  high  SNR,  K,^  ~  with  equality  for  large  T. 

Proof  that  0)  <  0) 

The  optimal  training  energies  can  be  written  in  implicit  form  as 


where 


^0,(oo,0) 


arg  max 

0<Ko<Ktot 

arg  max 

0<K,o<K>tot 


9(1^0), 

9{l^o)f{l^o) 


9(1^0) 


_A 

tttoi—Kio 

Ato 

.tttot  —  tto  +  (T  —  1)_ 

_1  +  Ato. 

and 


/(t^o) 


=  9M 


1  +  Ato 

Ato 

1 

\{1  +  tto)  +  Y^i(l  +  +  tt0l“„2T 


(58) 


Note  that  g{x)  has  one  critical  point  (a  maximnm)  in  the  range  0  <  a:  <  tttot  located  at 
a;  =  Atg  (1  0),  and  so  is  decreasing  for  ttg  <  x  <  tttot-  Note  also  that 


p(tto)  = 


1  +  Ato 

Ato 

1 

|(1  +  tto)  +  +  tto  i”q,2T 


is  a  decreasing  fnnction  in  the  range  0  <  tto  <  tttot-  These  two  facts  imply  that  f{x)  is  a 
decreasing  fnnction  for  ttg  <x<  tttot,  and  therefore,  ttg  =  argmaxo</s:o</yt„t 

9il^o)fil^o)  <  t€S,(i,o)- 


Proof  that  At*  g) 
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We  would  like  to  show  that 


T-l 


arg  min  — 

o<Ko<Ktot  T 


5^  log: 


2  W 


t=l 


(1,0)  ~  tto  ^ 

- — - — —  >  arg  mm  — 

-  tto  +  T-l  o<Ko<Ktot  T 


1 

-  5]]  log2  a;J 


(1,1)  —  tto 


^=1 


ttav^  —  tto  +  7^—1 


Each  term  in  the  left-hand  sum  is  minimized  by  the  same  value  of  tto-  Each  term  on  the 
right-hand  side  has  one  critical  point  (a  minimum)  in  0  <  tto  <  tttot-  Therefore,  a  sufficient 
(but  not  necessary)  condition  is  to  show  that,  for  every  1  <  £  <  T— 1, 


(1,0)  ttavT  KjQ 

arg  max  uj)  - — -  >  arg  max 

o<Ko<Ktot  1  +  ttavT  -  Kq  o<Ko<Ktot 


U) 


(1,1) 


,  ,(1,0) 

U)  0 


to 


(1,0) ^  av  T  —  Kq 
1  +  ttavT  —  Kq 


This  is  proven  by  verifying  that  is  a  decreasing  function  of  tto- 
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G.  Derivation  of  the  Training  Period  Bounds  (Section  5.2) 


Proof  that  High  SNR  Yields  a  Lower  Bound  on  Training  Period  for  Casual 
AR(1)  Estimators 


Here,  we  prove  that  for  the  AR(1)  model,  equation  15  gives  a  lower  bound  on  the  optimal 
training  period  that  is  exact  at  high  SNR.  Suppose  that  the  last  pilot  used  in  the 
estimation  is  n  pilots  old.  Note  that  uj(  =  for  any  casual  estimator, 

and  that  at  high  SNR,  We  wish  to  show  that 

1  f  tti 

argmax  —  —  >  logo  <  1  —  cu-nT - 

^  T  T  ^  1  +  Ki  2 

e=i  ^ 

1 

arg  max - 

”  T  T 


Q,2(nr+€) 


where  the  right-hand  side  of  the  above  equation  is  the  cutoff  rate  as  ttav  — t  oo. 
Equivalently,  we  must  show  that 


-5trEi«g2{i-/3(r+i)c(r+i)} 

e=i _ 

-t  Eiog2{i-/3(r)c(r)} 

e=i 


1 

T+l 


Elog2{l 


e=i 


T 


T-1 


E  log2{l 

e=i 


c(r+i)} 

-  c(r)} 


where  /3(T)  =  w^nTj^^  is  optimized  over  Kq  and  Ki  for  a  fixed  T,  and  where 

c(T)  =  Next,  note  that  B(T)  is  an  increasing  function  of  T  (a  consequence  of  the 

fact  that  o^co-nT  >  a^i+ko^-  enough  to  show  that 

E  log2  {1  -  P{T)c{T+l)}  E  log2  {1  -  c{T+l)} 

i=i _ ^  ^=1 _ 

e'  log2  {1  -  mc{T)}  E  log2  {1  -  c(T)} 

£=1  £=1 


or  that 


T 


Elog2{l-/?(T)c(T+l)} 

e=i 


E'iog,{i-mc(T)} 

£=1 

is  a  decreasing  function  of  0  <  /3(T)  <  1  which  is  easily  proven  (noting  that 

C(T  -h  1)  <  C(T)).  Therefore,  the  optimal  training  period  at  high  SNR  is  a  lower  bound  on 

the  training  period  at  any  SNR;  it  is  obviously  attainable. 
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Proof  that  Training  Period  for  Last  Pilot  Estimation  is  Larger  than  that  for 
Infinite  Past  Estimation 


Here  we  prove  that  the  training  period  for  the  Q(i,o)  estimator  is  larger  than  that  of  the 
Q(oo,o)  estimator  >  T*^  for  any  channel  with  monotonically  decreasing  correlation 
fnnction  Rhir)  nnder  the  eqnal  energy  assnmption  Hq  =  Hi  =  Atav  We  wish  to  show  that 


arg  max  ^ 
^  T  T 


1 


i=i 


RlV) 


n. 


1  +  ttav 
T-l 


>  arg  max  ^  log^  1  -  (t^av,  T)  - 

£=1  L 


K, 


+  tta 


where  we  emphasize  that  the  estimator  qnality  for  the  Q(oo,o)  estimator  is  a  fnnction  of  K^v 
and  T.  To  show  that  the  ineqnality  above  is  satisfied,  we  show  that 


-  E  l0g2 

t=l 

1 

1—1 

1  1 

^  ttav  ^ 
^  l  +  ZS^av  j 

2" 

-  E  l0g2 

.  1=1 

T-l 

-  E  log2 

£=1 

1 

T— 1 

1 _ 1 

(  ttav  ^ 
^  l  +  Zx-av  J 

2" 

1 

—  T-l 

-  E  log2 

(.=1 

Note  that,  becanse  Rh{T)  is  monotone  decreasing,  (Kav,T)  is  a  decreasing  fnnction  of 

T.  We  nse  this  fact  to  increase  the  right-hand  side  of  the  above  expression,  and  prove  the 
resnlting  ineqnality 


-  E  log2 

t=l 

> 

cs; 

1 

1—1 

1  1 

2" 

-  E  l0g2 
^  ^=1 

[i  -  (K.v,r)  ife’ 

T-l 

-  E  log2 

t=l 

1 

T— 1 

1 _ 1 

2" 

I 

—  T-l 

-  E  log2 

e=i 

by  showing  that 


-  E  l0g2 

e=i 


1  - 


RlW 


P 


T-l 

-  E  log2 

i=l 


1  _ 

J-  2  " 


=  1  + 


log2 


1 


RliT) 


P 


T-l 

-  E  log2 

t=l 


1  _ 

J-  2  ” 
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is  a  decreasing  fnnction  in  0  <  p  <  1,  which  is  easily  proven  by  noting  again  that 

RUT)  >  RUt). 


Proof  that  High  SNR  Yields  a  Lower  Bound  on  Training  Period  for  the  Last 
Pilot  Estimator 


Here,  we  consider  the  Q(i,o)  estimator,  and  prove  that  for  any  channel  with  decreasing 
correlation  fnnction  R^ij),  eqnation  27  gives  a  lower  bonnd  on  the  optimal  training  period 
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that  is  exact  at  high  SNR.  The  cutoff  rate  is 


T~1 


i=l 


where  p{T)  =  optimized  over  Hq  and  Hi  for  a  hxed  value  of  T.  Note  that 

p{T)  is  an  increasing  function  of  T.  We  wish  to  show  that 


T-l 


arg  max - 

^  T  T 


£=1 


Rl(t)  K„(T)  Ki(r) 

2  1  +  Ko{T^  1  +  Ki{T) 

>  arg  max  -  loga  [l  -  R 


T-l 


e=i 


It  is  sufficient  to  show  that 

T 

I 

e=i 


-  E  log2  f  1  -  ^^P{T  +1)1  -  E  log2 

_ i> 


1  - 


Rid) 


T-l 

-  E  log2 
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I  _  ^ 


P{T) 


T-l 

-  E  log2 

r=i 


1  - 


Rli^) 


We  can  replace  the  left  hand  side  of  the  above  expression  by  a  smaller  quantity  and  show 
that  the  resulting  inequality  holds.  Replacing,  p{T  +  1)  with  p(T),  we  must  show  that 


logs 
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^p(T) 


logs 


T-l 

E  logs 
1=1 
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Rl(T) 
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E  log2 
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or  that  the  left-hand  side  is  a  decreasing  function  in  0  <  p{T)  <  1.  A  sufficient  condition  is 
to  show  that 


fip)  = 


-log2 

1 - 1 

1 

S) 

1 _ 1 

-logs 

\  Ri(Ry 

1  - 

is  increasing  in  the  range  0  <  p  <  1  which  follows  by  nothing  that  0  <  Rh{T)  <  Rh{t}  <  1- 


67 


Intentionally  Left  Blank. 
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H.  Variable-Energy  Substitution  Function  (Section  6.1) 


First  we  show  that  K*  =  H*  as  a  ^  1.  As  a  — )■  1,  all  data  slots  following  a  pilot  have  the 
same  estimator  quality  or  “predictability,”  and  so  we  expect  all  data  slots  to  be  activated 
and  for  K*  to  converge  to  the  solution  for  the  hxed-data  energy  case  equation  13.  That  is, 
we  expect  K*  =  [a€o,  K,i,  •  •  • ,  tti],  where  Kq  and  Ki  are  given  in  equation  13.  This  is  the 
case,  as  will  be  shown  below. 

From  equation  20,  we  activate  all  T— 1  data  slots  iff  tttot  >  (l>a{T—2).  Note  that 

-  2)  =  (T  -  2)  -  (T  -  2)  +  yi  =  0, 

and,  therefore,  all  T— 1  data  slots  are  activated.  Note  that 

lini  [Ktot  -  Ko{T-1)  +  (T  -  1)]  -  1 

_  tttot  —  tto(T  —  1) 

“  T-  1  ’ 

Therefore,  each  data  slot  is  allocated  an  equal  amount  of  data  energy.  Next,  note  that 

limAto(T-l)  = - - 1-  (T-2)2  +  T-1).  (61) 

Substituting  equation  60  into  59  and  simplifying,  we  get 

_  i^tot  +  1  / +  T— 1)^  —  (T— 1)(T— 2)(ti:tot  +  T— 1) 

^  “  T-2  ~  y  (T-2)2(T-1)2 

_  tttot  +  1  tot  +  i)(^  tot  T  1) 

“  T-2  ~  y  (T-l)(T-2)2 

which,  with  minor  simplihcation,  is  seen  to  match  equation  13. 

Here,  we  show  that  Av*  =  At*  as  a  — )■  0.  As  a  — )■  0,  predictability  of  the  channel  is  lost  and 
we  expect  only  one  data  channel  to  be  activated,  and  allocated  half  of  the  total  available 
energy  (for  T^  =  1,  the  variable  data  energy  problem  reduces  to  the  hxed  data  energy 
problem  of  equation  13).  Next,  we  show  that  this  is  the  case:  From  equation  20,  we 
activate  only  one  data  channel  if  0  <  Ktot  <  0a(2).  Since  hma^o0a(2)  =  oo,  only  one  data 
channel  should  be  activated.  From  part  (c)  of  equation  20,  we  see  that  the  activated  data 
slot  if  allocated  half  the  total  energy. 

Here,  we  show  that  H*  =  H*  as  Attot  — t  0.  Note  that  Ktot  — t  0  implies  that 

— )■  0,  0  <  i  <  T—1.  We  start  from  the  cutoff  rate  expression  in  equation  18  and  note 
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that,  since  <  1, 

i  Y_5^y 

4  yi  +  Ato/  vi  +  ttf/ 

where  a  Taylor  series  expansion  has  been  taken  (aronnd  0)  for  each  term  in  the  snm  of 
eqnation  18.  As  tttot  — t  0 


Rn  = 
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T-l 

El 


i=\ 
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21 


Kq 


Ki 


1  +  tto  1  + 


+  0 


Ro 


1  1  2e  tto  Ke 

T  2  1  +  Ato  1  + 


(62) 


It  is  clear  that  optimization  of  eqnation  61  and  19  over  H  yields  the  same  optimizer,  and 
therefore,  that  /**  =  At*  as  a  — )■  0. 
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I.  Energy  Allocation  for  Variable-Energy  Data  Slots  (Section 

6.2) 


We  will  give  the  proof  of  the  theorem  for  the  case  where  >  1.  First,  we  verify  that  Rq  is 
concave  in  K.  This  follows  easily: 


dKl 

84 


T-1 
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+  1  + 


<  0, 


Hir 


—2a 


21 


Kq 


(1  +  Kq)  (1  +  KiY 


<  0,  forl<£<T-l. 


Snppose  M  slots  are  active  (the  optimal  valne  oi  M  =  will  follow  from  the  analysis). 
From  the  Knhn-Tncker  conditions  [22],  a  necessary  and  snfficient  condition  for  the 
elements  of  K  to  optimize  Ro  (given  that  M  slots  are  active)  is 


dUi 

dRo 

dK,f 


1^^  =  A,  for  0  <  £  <  M, 

k,=o  <  A,  for  M  +  1  <  £  <  T-1, 


(63) 

(64) 


Consider  the  following  candidate  solntion:  let  the  training  energy  be  given  by 

=  —A  (M  +  Ktot) 

+  (^^  +  ^)  {M  +  tttot)^  ~  (^  +  1)  +  tttot),  (65) 

where  A  =  \  and  let  the  data  energy  be  given  by 

for  1  <  £  <  M 


2  a-a^ 

1-0 


Ki  = 
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l_aM  —  K,o{M)  +  M]  —  1, 

0,  for  M  + 1  <  £  <  T-1. 


(66) 


Snbstitnting  eqnations  64  and  65  into  eqnation  62,  we  see  that  the  condition  is  satished 
and  that 


X  = 


1  —  a 


M\  2 


1 


1  +  tto  V  1  a  y  (tttot  ~  tto  +  M) 

(we  abbreviate  as  Hq  for  brevity).  Snbstitnting  eqnation  65  into  63,  we  see  that  the 

condition  in  eqnation  63  is  satished  only  if 

.,M\  2 

^  9  /  J-  L 
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,2(M+1)  ^0  <_x  =  a2  ■ 

1  +  tto  l  +  tto\l~Cl  /  (tttot  ~  t€o  +  MY 
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or,  eqnivalently,  only  if 


tttot  —  tto  +  Tf  < 


-  1 
1  —  a 


(67) 


71 


Substituting  equation  64  into  66  and  solving  explicitly  for  Kq,  we  see  that  this  condition  is 
equivalently  satished  if 


Ktot  < 


a-^-1 
1  —  a 


{a  ^  —  a)  {a  —  1) 

1  — 


(68) 


Therefore,  if  the  initial  choice  of  M  satishes  equation  67,  then  equations  64  and  65  are 
indeed  the  maximizers  of  Ro-  Rewording:  hrst  we  choose  Ta  to  be  the  smallest  integer  in 
{1, . . . ,  T  —  1}  such  that  equation  67  holds  (or,  if  no  such  integer  exists,  choose  Ta  =  T  —  1). 
Then,  we  select  the  training  and  data  energies  as  in  equations  67  and  64.  This  establishes 
equation  20. 
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